Wyniki wyszukiwania - BazTech

1

Detection of Sentence Boundaries in Polish Based on Acoustic Cues

Igras M., Ziółko B.

Archives of Acoustics

|

2016

|

Vol. 41, No. 2

233--243

EN

In this article the authors investigated and presented the experiments on the sentence boundaries annotation from Polish speech using acoustic cues as a source of information. The main result of the investigation is an algorithm for detection of the syntactic boundaries appearing in the places of punctuation marks. In the first stage, the algorithm detects pauses and divides a speech signal into segments. In the second stage, it verifies the configuration of acoustic features and puts hypotheses of the positions of punctuation marks. Classification is performed with parameters describing phone duration and energy, speaking rate, fundamental frequency contours and frequency bands. The best results were achieved for Naive Bayes classifier. The efficiency of the algorithm is 52% precision and 98% recall. Another significant outcome of the research is statistical models of acoustic cues correlated with punctuation in spoken Polish.

2

Speech emotion recognition system for social robots

Juszkiewicz Ł.

Journal of Automation Mobile Robotics and Intelligent Systems

|

2013

|

Vol. 7, No. 4

59--65

EN

The paper presents a speech emotion recognition system for social robots. Emotions are recognised using global acoustic features of the speech. The system implements the speech parameters calculation, features extraction, features selection and classification. All these phases are described. The system was verified using the two emotional speech databases: Polish and German. Perspectives for using such system in the social robots are presented.

3

Exploiting Prosody for Automatic Syntactic Phrase Boundary Detection in Speech

Szaszák G., Beke A.

Journal of Language Modelling

|

2012

|

Vol. 0, No. 1

143--172

EN

The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-to-speech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an experiment towards filling this gap and evaluating whether a HMM-based automatic prosodic segmentation tool can be used to support the reconstruction of the syntactic structure directly from speech. Results show that up to 85% of syntactic clause boundaries and up to about 70% of embedded syntactic phrase boundaries could be identified based on the detection of phonological phrases. Recall rates do not depend further on syntactic layering, in other words, whether the phrase is multiply embedded or not. Clause boundaries can be well assigned to intonational phrase level in read speech and can be well separated from lower level syntactic phrases based on the type of the aligned phonological phrase(s). These findings can be exploited in speech understanding systems, allowing for the recovery of the skeleton of the syntactic structure, based purely on the speech signal.

4

Syntetyzer mowy uwzględniający prozodię wypowiedzi

Łopatka K., Czyżewski A.

Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej

|

2010

|

Nr 28

105-108

PL

Przedstawiono system syntezy mowy polskiej uwzględniający w sposób automatyczny prozodię, tj. profil intonacyjny, tempo i akcenty wypowiedzi. Zastosowano syntezę konkatenacyjną z wykorzystaniem jednostek mowy zawierających przejścia między dwoma głoskami – difonów. Opisano poszczególne moduły wchodzące w skład syntetyzera: przetwarzanie tekstu, bazę jednostek mowy oraz algorytmy związane z tworzeniem syntetyzowanego sygnału. Przeprowadzono testy subiektywne potwierdzające wysoką zrozumiałość generowanej mowy i skuteczność modyfikacji prozodycznych. Przedstawiono możliwość zastosowania opisanego systemu w aplikacjach edukacyjnych lub terapeutycznych oraz interfejsach multimodalnych przeznaczonych dla osób niepełnosprawnych.

EN

The paper presents a Text-To-Speech synthesizer of Polish language employing automatic prosodic modification. The method used for synthesizing the speech signal is concatenative synthesis using constant-length segments – diphones. The subsequent modules of the synthesizer are introduced. Employed language analysis and signal processing techniques are described. The synthesized speech yields high intelligibility and naturalness, which is proved by auditory tests. The proposed system can be used in educational and therapeutic applications or multimodal interfaces for disabled people.