Wyniki wyszukiwania - BazTech

1

Detection of Sentence Boundaries in Polish Based on Acoustic Cues

Igras M., Ziółko B.

Archives of Acoustics

|

2016

|

Vol. 41, No. 2

233--243

EN

In this article the authors investigated and presented the experiments on the sentence boundaries annotation from Polish speech using acoustic cues as a source of information. The main result of the investigation is an algorithm for detection of the syntactic boundaries appearing in the places of punctuation marks. In the first stage, the algorithm detects pauses and divides a speech signal into segments. In the second stage, it verifies the configuration of acoustic features and puts hypotheses of the positions of punctuation marks. Classification is performed with parameters describing phone duration and energy, speaking rate, fundamental frequency contours and frequency bands. The best results were achieved for Naive Bayes classifier. The efficiency of the algorithm is 52% precision and 98% recall. Another significant outcome of the research is statistical models of acoustic cues correlated with punctuation in spoken Polish.

2

Baza danych nagrań mowy emocjonalnej

Igras M., Ziółko B.

Studia Informatica

|

2013

|

Vol. 34, nr 2B

67--77

PL

Artykuł prezentuje opracowaną w AGH bazę danych nagrań mowy emocjonalnej, zgromadzoną w celu badań nad zawartością afektywną sygnału mowy. Opisano sposób rejestracji, parametry, strukturę, metadane i licencję bazy danych. Przedstawiono przykładowe zastosowania do opracowania metod detekcji stanów emocjonalnych w głosie oraz normalizacji nagrań na potrzeby ASR.

EN

The paper presents a database of emotional speech recordings collected in AGH for research on affective content of speech signal. We describe the method of data acquisition, the parameters, structure, metadata and license. We present example applications for development of the methods of emotions detection in voice and emotional speech normalization for ASR.

3

Baza danych nagrań mowy dla analizy porównawczej różnojęzycznych fonemów

Mąsior M., Igras M., Ziółko M., Kacprzak S.

Studia Informatica

|

2013

|

Vol. 34, nr 2B

79--87

PL

Artykuł prezentuje system gromadzenia, archiwizacji i akustycznej analizy wielojęzycznych próbek mowy. Głównym celem badań jest analiza porównawcza fonemów dla kilkuset języków i stworzenie drzewa genealogicznego języków świata. Opisana została implementacja systemu, jako bazy danych z portalem internetowym. Przedstawiono informacje dotyczące zawartości i formy bazy, perspektyw rozwoju i zastosowań w lingwistyce komputerowej.

EN

The paper presents a system of collecting and analyzing multilanguage speech samples for research on characteristics of phonemes in several hundred world languages. We describe the implementation: database and webpage. The content and form of the database and applications for development of the new methods of speech analysis are presented.

4

Detection of disfluencies in speech signal

Barczewska K., Igras M.

Challenges of Modern Technology

|

2013

|

Vol. 4, no. 2

3--10

EN

During public presentations or interviews, speakers commonly and unconsciously abuse interjections or filled pauses that interfere with speech fluency and negatively affect listeners impression and speech perception. Types of disfluencies and methods of detection are reviewed. Authors carried out a survey which results indicated the most adverse elements for audience. The article presents an approach to automatic detection of the most common type of disfluencies - filled pauses. A base of patterns of filled pauses (prolongated I, prolongated e, mm, Im, xmm, using SAMPA notation) was collected from 72 minutes of recordings of public presentations and interviews of six speakers (3 male, 3 female). Statistical analysis of length and frequency of occurrence of such interjections in recordings are presented. Then, each pattern from training set was described with mean values of first and second formants (F1 and F2). Detection was performed on test set of recordings by recognizing the phonemes using the two formants with efficiency of recognition about 68%. The results of research on disfluencies in speech detection may be applied in a system that analyzes speech and provides feedback of imperfections that occurred during speech in order to help in oratorical skills training. A conceptual prototype of such an application is proposed. Moreover, a base of patterns of most common disfluencies can be used in speech recognition systems to avoid interjections during speech-to-text transcription.

5

Audiowizualna baza nagrań mowy polskiej

Igras M., Ziółko B., Jadczyk T.

Studia Informatica

|

2012

|

Vol. 33, nr 2B

163-172

PL

Autorzy prezentują największą, audiowizualną bazę danych mowy polskiej i zarazem jedyną zrealizowaną w jakości HD. Artykuł przedstawia krótki opis podobnych baz dla innych języków oraz opis techniczny wykonanej bazy. Omówiono także napotkane wyzwania w trakcie realizacji bazy danych i jej planowane zastosowania.

EN

The biggest audiovisual database of Polish speech (and the only one made in HD quality) is presented. The paper shortly introduces description of similar databases for other languages and the technical specification of the AGH database. The challenges met during the process of building the database are discussed along with the planned applications.

6

Pomiary parametrów akustycznych mowy emocjonalnej - krok ku modelowaniu wokalnej ekspresji emocji

Igras M., Wszołek W.

Pomiary Automatyka Kontrola

|

2012

|

R. 58, nr 4

335-338

PL

Niniejsza praca podejmuje próbę pomiaru cech sygnału mowy skorelownych z jego zawartością emocjonalną (na przykładzie emocji podstawowych). Zaprezentowano korpus mowy zaprojektowany tak, by umożliwić różnicową analizę niezależną od mówcy i treści oraz przeprowadzono testy mające na celu ocenę jego przydatności do automatyzacji wykrywania emocji w mowie. Zaproponowano robocze profile wokalne emocji. Artykuł prezentuje również propozycje aplikacji medycznych opartych na pomiarach emocji w głosie.

EN

The paper presents an approach to creating new measures of emotional content of speech signals. The results of this project constitute the basis or further research in this field. For analysis of differences of the basic emotional states independently of a speaker and semantic content, a corpus of acted emotional speech was designed and recorded. The alternative methods for emotional speech signal acquisition are presented and discussed (Section 2). Preliminary tests were performed to evaluate the corpus applicability to automatic emotion recognition. On the stage of recording labeling, human perceptual tests were applied (using recordings with and without semantic content). The results are presented in the form of the confusion table (Tabs. 1 and 2). The further signal processing: parametrisation and feature extraction techniques (Section 3) allowed extracting a set of features characteristic for each emotion, and led to developing preliminary vocal emotion profiles (sets of acoustic features characteristic for each of basic emotions) - an example is presented in Tab. 3. Using selected feature vectors, the methods for automatic classification (k nearest neighbours and self organizing neural network) were tested. Section 4 contains the conclusions: analysis of variables associated with vocal expression of emotions and challenges in further development. The paper also discusses use of the results of this kind of research for medical applications (Section 5).