Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 24

Liczba wyników na stronie
first rewind previous Strona / 2 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  speech
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 2 next fast forward last
EN
Recent papers and studies over the course of last three years have shown that COVID-19 has a negative impact on the speech communication quality between people. This paper presents an influence analysis of the curvature shape of protective transparent shields on the speech signal. Five shields made of the same material and dimensions but with different curvatures were analyzed, from a completely flat to a very curved shield which has the same shape of curvature at its top and bottom and covers the entire face. The influence of the shield is analyzed with two types of experiments – one using dummy head with integrated artificial voice device, and the other using real speakers (female and male actors). It has been shown that usage of protective shields results in a relative increase in the speech signal level, in the frequency range of around 1000 Hz, compared to the situation when protective shields are not used. The relative increase in speech signal levels for large-curvature shields can be up to 8 dB. The possible causes of this phenomenon have been analyzed and examined.
EN
This paper presents results of research on effects of lossy coding on formant frequencies for japanese speech signals. Additionally changes in pitch of the voice were inspected. For this research four most popular lossy coding standards were chosen, MP3, WMA, AAC and OGG, and compared to original WAVE files. Audio files were created by the author based on ITU-T P.501 recommendation in two sampling frequencies, 16 kHz and 48 kHz, and converted into chosen codecs. To extract the data from audio files, open license software Praat was used. Due to discovered differences in time duration between original and encoded files, that also differed between individual codecs, only OGG and WMA standards were compared directly. MP3 and AAC standards were divided into Japanese syllables, averaged and then compared into also averaged WAVE files. Results were additionally compared to FLAC lossless codec.
PL
W referacie przedstawiono system automatycznego rozpoznawania mówcy zaimplementowany w środowisku Matlab oraz pokazano sposoby realizacji i optymalizacji poszczególnych elementów tego systemu. Główny nacisk położono na wyselekcjonowanie cech dystynktywnych głosu mówcy z wykorzystaniem algorytmu genetycznego, który pozwala na uwzględnienie synergii cech podczas selekcji. Pokazano również wyniki optymalizacji wybranych elementów klasyfikatora, m.in. liczby rozkładów Gaussa użytych do zamodelowania każdego z głosów. Ponadto, podczas tworzenia modeli głosów zastosowano model głosu uniwersalnego.
EN
The paper presents automatic speaker recognition system, implemented in the Matlab environment, and demonstrates how to achieve and optimize various elements of the system. The main emphasis was put on features selection of speech signal using a genetic algorithm, which takes into account synergy of features. The results of the selected elements of optimizing classifier have been also shown, including the number of Gaussian distributions used to model each of the voices. In addition during creating voice models, the universal voice model have been used.
4
Content available remote Polish emotional speech recognition based on the committee of classifiers
EN
This article presents the novel method for emotion recognition from polish speech. We compared two different databases: spontaneous and acted out speech. For the purpose of this research we gathered a set of audio samples with emotional information, which serve as input database. Multiple Classifier Systems were used for classification, with commonly used speech descriptors and different groups of perceptual coefficients as features extracted from audio samples.
PL
Niniejsza praca dotyczy rozpoznawania stanów emocjonalnych na podstawie głosu. W artykule porównaliśmy mowę spontaniczną z mową odegraną. Na potrzeby zrealizowanych badań zgromadzone zostały emocjonalne nagrania audio, stanowiące kompleksową bazę wejściową. Przedstawiamy nowatorski sposób klasyfikacji emocji wykorzystujący komitety klasyfikujące, stosując do opisu emocji powszechnie używane deskryptory sygnału mowy oraz percepcyjne współczynniki hybrydowe.
EN
Re-speaking is a mechanism for obtaining high-quality subtitles for use in live broadcasts and other public events. Because it relies on humans to perform the actual re-speaking, the task of estimating the quality of the results is non- trivial. Most organizations rely on human effort to perform the actual quality assessment, but purely automatic methods have been developed for other similar problems (like Machine Translation). This paper will try to compare several of these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER, and RIBES. These will then be matched to the human-derived NER metric, commonly used in re-speaking. The purpose of this paper is to assess whether the above automatic metrics normally used for MT system evaluation can be used in lieu of the manual NER metric to evaluate re-speaking transcripts.
6
Content available Automatic detection of stuttering in a speech
EN
In the work authors applied speech recognition techniques to find disfluent events. The recognition system based on the Hidden Markov Model Toolkit was built and tested. The set of context dependent HMM models was trained and used to locate speech disturbances. Authors were not concentrated on specific disfluency type but tried to find any extraneous sounds in a speech signal. Patients read prepared sentences, the system recognized them and then results were compared to manual transcriptions. It allowed the system to be more robust and enabled to find all disfluencies types appearing at word boundaries. Such system can by utilized in many ways, for example like a "preprocessor" that finds strange sounds in a speech to be analyzed or classified by other algorithms later, to evaluate or track therapy process of stuttering people, to evaluate speech fluency by ´normal´ speakers, etc.
EN
The therapy of stuttering people is based on a proper selection of texts and then on a practice of their articulation by reading or narration. The texts are chosen on the basis of kind and intensity of dysfluencies appearing in a speech. Thus there is still a requirement to find effective and objective methods of analysis of dysfluent speech. Hidden Markov models are stochastic models widely used in recognition of any patterns appearing in a signal. In the work a simple monophone system based on the Hidden Markov Model Toolkit was built and tested in the context of detection and classification of phoneme repetitions - a common speech disorder in the Polish language.
8
Content available remote Human singing as a form of bio-communication
EN
Most probably music, similarly to human speech, represents a biological adaptation [1], and singing is a mode of communication older than speech, present already in ancestors of Homo sapiens [2]. In various species of apes vocal expression has been demonstrated to be linked with expression of emotions [3], which indicates that singing is a carrier of emotional information which in evolution has appeared before formation of Homo sapiens. Hierarchical pattern of processing sound information in human cognitive system [4] allows to assume that singing may induce in the recipient both basic emotions and more complex reactions, linked to altered mood or induction of emotions. Processing of specific musical stimuli evokes specific emotional reactions [5]. Contemporary knowledge on processing of music in the nervous system and evolutionary perspective permit to distinguish such traits of musical course which code data on the type and intensity of emotions. According to the authors, qualitative coding of principal emotions in musical course involves mainly segmental level using physical traits of the sound, such as intensity and timbre of sound while quantitative coding at the suprasegmental level involves mainly changes in tempo and intensity of sounds. In emotional communication conducted through a musical course the shared by the broadcaster and recipient set of culture-specific data on traits of music necessary for its correct processing in specific structures of nervous system, linked to cognitive processes, also plays a significant role. In the study a hierarchical model of singing structure was suggested, which attempts to explain the way in which expression is coded and emotions are perceived in interpersonal communication.
EN
The goal of the paper is to present a speech nonfluency detection method based on linear prediction coefficients obtained by using the covariance method. The application “Dabar” was created for research. It implements three different methods of LP with the ability to send coefficients computed by them into the input of Kohonen networks. Neural networks were used to classify utterances in categories of fluent and nonfluent. The first one was Kohonen network (SOM), used to reduce LP coefficients representation of each window, which were used as input data to SOM input layer, to a vector of winning neurons of SOM output layer. Radial Basis Function (RBF) networks, linear networks and Multi-Layer Perceptrons were used as classifiers. The research was based on 55 fluent samples and 54 samples with blockades on plosives (p, b, d, t, k, g). The examination was finished with the outcome of 76% classifying.
EN
In the work algorithms commonly utilized in continuous speech recognition systems were applied to detection of speech disorders. The used algorithms were briefly described and the final method of speech disorders detection was presented. The article includes the results of the short test performed in order to check the effectiveness and accuracy of the method. The aim of the test was detection and classification of fricative phonemes prolongation one of the most common speech disorders in the Polish language. It is worth emphasizing that this method enables detection of a category of speech disturbance (e.g. fricative, nasal, vowels, etc… prolongation), but also provides the information about a specific phoneme being disturbed.
PL
W pracy opisano nowy sposób wykorzystania filtracji adaptacyjnej do poprawy jakości dźwięków użytecznych nagrywanych w obecności zakłóceń. Przedstawiono stworzony algorytm adaptacji, omówiono możliwości przetwarzania dźwięku dodatkowymi algorytmami, opisano przeprowadzone eksperymenty. Zamieszczono i omówiono wyniki eksperymentów. Zaproponowano sposób integracji opracowanej metody z systemami akustycznego monitorowania aglomeracji miejskiej.
EN
This paper describes a technique of improving the quality of speech signals recorded under interference (adaptive filter based algorithm). Proposed algorithm is described and additional possibilities of speech intelligibility improvement are discussed. Results of the tests are presented. A way of integrating the elaborated method with an agglomeration acoustic monitoring system is proposed. The research is subsidized by the Polish Ministry of Science and Higher Education within Grant No. R00-O0005/3.
12
Content available remote Nonconscious Control of Voice Intensity During Vocalization
EN
There are two separate visual systems in the human brain. Evidence from studies on both the humans and other primates has shown that there is a distinction between vision for perception and vision for action, which is reflected in the organization of the visual pathways in the cerebral cortex of primates. In recent years, researchers have attempted to find a similar dissociation between action and perception in human audition. The hypothesis tested in this paper is that the voice intensity is tracked and controlled by an auditory motor system. The results of this control are used for nonconciously correct the vocal production. To observe the dissociation between perception and motor control, a subliminal experimental situation was created, in which values below the perceptual threshold (values which were not processed through normal channels or apparatus of perception) were used. The hypothesis was that a subliminal modification of an auditory voice feedback would cause an appropriate correction as a response, even if this change was not actually perceived. Assuming that the auditory system functions in the same way as the visual one and processes the information vital for motor reactions in real time, a reaction that would compensate such a modification should be expected.
EN
In this paper the method and the program to measurement of time distances between middle points of speaking or playing sounds was presented. The proposed algorithm can be used to the score of the human state for persons with dysfunction of the central nervous system. The differences between results obtained for healthy persons an impaired persons was shown.
PL
Przedstawiono metodę i program służący do pomiaru czasów między środkami wyrazów lub wygrywanych dźwięków. Zaproponowany algorytm może być wykorzystany do oceny stanu osób z dysfunkcjami centralnego układu nerwowego. Pokazano różnice między wynikami osób zdrowych i chorych.
EN
The paper contains a short review of analytic, constructional and development investigations of low power electromagnetic microwave sensors (EMS) destined for applying them in laryngectophone system. General procedure rule and construction examples are presented. The measurement stand used to research signals produced by sensors of articulators' motion are described. Analytic method used to examine the usefulness of these signals to identify the vibrant phonemes is reviewed.
15
Content available remote Listeners' reaction time response to speech-in-noise material
EN
The paper addresses the problem of choosing speech material for the experiments concerning measurements of the composed reaction time (CRT). A comparison was done of mean reaction times measured for a group of subjects exposed to Polish vowels /a, e, i, o, u, y/ and to non-words recorded on a dummy head against traffic noise (European Standard EN 1793-3) generated from an open window. The results of this experiment, analyzed for various signal-to-noise ratios and different reverberation conditions, indicate that the mean reaction time was greater for non-words (when the subjects were exposed to more complex signals) rather then for vowels. However, differences in the relative growth of the reaction time values with decrease of signal to noise-source output difference level (SNS) were relatively small.
EN
The Hidden Markov Model (HMM) is a stochastic approach to recognition of patterns appearing in an input signal. In the work author's implementation of the HMM were used to recognize speech disorders - prolonged fricative phonemes. To achieve the best recognition effectiveness and simultaneously preserve reasonable time required for calculations two problems need to be addressed: the choice of the HMM and the proper preparation of an input data. Tests results for recognition of the considered type of speech disorders are presented for HMM models with different number of states and for different sizes of codebooks.
17
Content available remote Mothers and their offspring perceive the tritone paradox in closely similar ways
EN
The tritone paradox is produced when two tones that are related by a half-octave (or tritone) are presented in succession, and the tones are so constructed that their pitch classes (C, C#, D; and so on) are clearly defined but their pitch heights are ambiguous. When listeners judge whether such tone pairs form ascending or descending patterns, their judgments show orderly relationships to the positions of the tones along the pitch-class circle: Tones in one region of the circle are heard as higher and those in the opposite region as lower. However, listeners disagree substantially as to which tones are heard as higher and which as lower, and these perceptual differences correlate with the language or dialect to which the listener has been exposed. In the present study, perceptions of mothers and their offspring were found to be strikingly similar, indicating that the mental representation influencing perception of the tritone paradox is formed early in life and survives into adulthood. It is conjectured that this mental representation is formed during the critical period in which infants acquire the features of their native language.
PL
Jednym z ważniejszych parametrów charakteryzujących źródło mowy dźwięcznej jest częstotliwość podstawowa tonu krtaniowego (ang. fundamental frequency, pitch, F0), która odpowiada częstości drgań fałd głosowych. W artykule zaproponowano metodę wyznaczania częstotliwości podstawowej tonu krtaniowego, która bazuje na wielorozdzielczej dekompozycji widma mocy sygnału mowy. Wspomnianą metodę zastosowano do analizy pojedynczo wypowiadanych głosek dźwięcznych.
EN
One of the basie parameters characterizing voiced speech is the fundamental frequency (the so-called pitch, F0), which corresponds to the rate of the vocal folds vibration. In this paper, the new method for determining fundamental frequency is proposed. This method is based on the multiresolution decomposition of speech signal power spectrum. The mentioned method was applied to some isolated voiced vowels.
19
Content available remote Computer-supported individualised therapy of non-fluent speech
EN
The therapy of stuttering people is a time-consuming and long-Iasting process which requires a great effort both from the logopaedist and patient. The process can be divided into three parts: recording of patient's utterances (reading, telling, conversation), 20-minute corrective exercises with the echo (reading, tell ing) and individual work of the stuttering person with difficult words. All of these tasks may be performed with the use of a computer, controlled by a special program elaborated for that purpose. The computer system for the logopaedic diagnosis and therapy (DTL) allows for recording and saving utterances as sound files, practice with acoustical or visual echo and performance of automatically generated tasks adjusted to individual difficulties of particular speakers. Examples of analyses performed at various periods of therapy, i.e. at the beginning, during and after the therapy, supply information conceming e.g. the stuttering intensity and types of the occurring errors. The results presented in this work concern the control recordings performed at 1-1.5-month periods of time for twelve patients.
PL
Artykuł zawiera krótki przegląd metod opisu mowy ludzkiej, ze wskazaniem ich wykorzystania w określonych etapach analizy głosu ludzkiego. Celem publikacji nie jest przedstawienie szczegółowych opisów analitycznych wzorów i zaawansowanych aparatów matematycznych, ale zaprezentowanie podstawowych założeń i mozliwości poszczególnych metod. Ze szczegółowym opisem kazdej z metod autor może zapoznać się w wymienionej, na końcu artykułu, bibliografii.
first rewind previous Strona / 2 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.