Wyniki wyszukiwania - BazTech

1

Sterowanie głosowe w systemach obróbkowych

Rogowski A.

Obróbka Metalu

|

2017

|

nr 3

36--42

EN

In this paper possibilities of voice control applying for operation of CNC machine-tools were presented. This problem was shown against a background of hitherto existing results of investigations, concerning to automatic voice recognizing algorithm, as like as peculiar its features which are depended on complexity of used words of commands were described. On this ground possible variants of types of commands were given. As the example of practically existing solution one presented worked out at Warsaw Technical University system of voice controlled work centre EMCO.

PL

W artykule omówiono możliwości stosowania sterowania głosowego przy obsłudze gniazd obróbkowych złożonych z obrabiarek CNC. Zagadnienie to ukazano na tle rezultatów dotychczasowych badań dotyczących automatycznego rozpoznawania mowy, a w szczególności jego zastosowania w szeroko rozumianym wytwarzaniu. Omówiony został ogólny schemat algorytmu rozpoznawania mowy oraz specyfika tego algorytmu zależna od stopnia złożoności stosowanych komend. Na bazie tego podane zostały możliwe warianty typów komend służących do obsługi zautomatyzowanych systemów obróbkowych i warianty ich przetwarzania w zależności od zadań, które mają być realizowane w wyniku tych komend. Jako przykład funkcjonującego rozwiązania przedstawiono opracowany na Politechnice Warszawskiej system sterowania głosowego szkoleniowym zrobotyzowanym gniazdem obróbkowym EMCO.

2

Allophones in automatic speech recognition

Kozierski P., Sadalla T., Dąbrowski A., Giernacki W.

Studia z Automatyki i Informatyki

|

2016

|

T. 41

47--53

EN

The common approach to the speech recognition problem is the use of phonemes as basic parts of speech. The authors proposed allophones usage instead. For rarer allophones the conversion into other allophones (4 selection methods) has been proposed. Based on the obtained results one can say that effective use of the additional information from allophonic notation will not be possible without modification of currently used algorithms.

PL

Typowym podejściem do zagadnienia rozpoznawania mowy jest branie pod uwagę fonemów, jako podstawowych części mowy. Zamiast tego autorzy zaproponowali wykorzystanie alofonów. Dla najrzadziej występujących alofonów zaproponowano ich zamianę na inne alofony – zaproponowano 4 metody wyboru głosek do zamiany. Na podstawie uzyskanych wyników stwierdzono, że efektywne wykorzystanie dodatkowych informacji, jakie niosą alofony, nie będzie możliwe bez modyfikacji obecnie dostępnych algorytmów.

3

System rozpoznawania mowy polskiej dla robota społecznego

Zygadło A., Janicki A., Dąbek P.

Pomiary Automatyka Robotyka

|

2016

|

R. 20, nr 4

27--36

PL

W artykule przedstawiono system automatycznego rozpoznawania mowy polskiej dedykowany dla robota społecznego. System oparty jest na bezpłatnej i otwartej bibliotece oprogramowania pocketsphinx (CMU Sphinx). Przygotowano zbiory nagrań: treningowy i testowy wraz z transkrypcjami. Zbiór treningowy obejmował głosy 10 kobiet i 10 mężczyzn i został przygotowany na podstawie audiobooków, natomiast zbiór testowy – głosy 3 kobiet i 3 mężczyzn nagrane w warunkach laboratoryjnych specjalnie na potrzeby pracy. Przygotowany zbiór fonemów dla języka polskiego, składający się z 39 fonemów, opracowany został na podstawie dwóch popularnych zbiorów dostępnych danych. Słownik fonetyczny opracowano za pomocą funkcjonalności konwersji grapheme-to-phoneme z biblioteki eSpeak. Model statystyczny języka dla tekstu referencyjnego składającego się z 76 komend wygenerowano za pomocą programu cmuclmtk (CMU Sphinx). Uczenie modelu akustycznego oraz test jakości rozpoznawania mowy przeprowadzono za pomocą programu sphinxtrain (CMU Sphinx). W warunkach laboratoryjnych uzyskano wskaźnik błędu rozpoznawania słów (WER) na poziomie 4% i błędu rozpoznawania zdań (SER) na poziomie 9%. Przeprowadzono też badania systemu w warunkach rzeczywistych na grupie testowej złożonej z 2 kobiet i 3 mężczyzn, uzyskując wstępne wyniki rozpoznawania na poziomie 10% (SER) z bliskiej odległości oraz 60% (SER) z odległości 3 m. Określono kierunki dalszych prac.

EN

Automatic Speech Recognition system for Polish and dedicated for social robotics applications is presented. The system is based on free and open software library pocketsphinx (CMU Sphinx). Training and test databases were prepared with transcriptions; the training database comprised voices of 10 women and 10 men, and it was prepared based on audiobooks, whereas the test database comprised voices of 3 women and 3 men recorded in laboratory conditions as a part of the present work. A phoneme set for Polish consisting of 39 phonemes based on two popular sets from other researchers was prepared. The phonetic dictionary was obtained using graphemeto-phoneme conversion from the eSpeak tool for speech synthesis. The language statistic model for the reference text including 76 commands was generated using cmuclmtk tool (CMU Sphinx). Training of the acoustic model and test of quality of speech recognition was conducted using the sphinxtrain tool (CMU Sphinx). The following error rates were obtained for laboratory conditions: 4% (WER) and 9% (SER). Next, investigations of the system in relevant real environment were conducted. The initial, tentative results are about 10% (SER) for the close distance of a speaker to a microphone, and about 60% (SER) for 3 m speaker-microphone distance. Directions of future works are formulated.

4

Grupowanie mówców i jego skuteczność dla języka polskiego

Zambrzycka A., Makowski R., Hossa R.

Elektronika : konstrukcje, technologie, zastosowania

|

2016

|

Vol. 57, nr 7

45--50

PL

Grupowanie mówców w zbiory o podobnych cechach akustycznych ich mowy, obok normalizacji i adaptacji, jest skuteczną metodą poprawy jakości systemów automatycznego rozpoznawania mowy. W pracy przedstawiono metody grupowania, dla których punktem wyjścia jest model akustyczny wszystkich mówców oraz ich efektywność dla mowy polskiej w odniesieniu głównie do samogłosek. Rozwiązania te okazały się być skuteczne nawet przy wykorzystaniu superkrótkiej wypowiedzi. Uzyskana poprawa jakości rozpoznawania ramek mierzona za pomocą frame error rate wynosi około 4%.

EN

Clustering of speakers into groups of similar acoustic features is, besides for normalization and adaptation, an efficient method of improving the quality of systems of automatic speech recognition. New approaches of speaker clustering based on the acoustic model for all speakers and their efficiency for Polish speech, mostly regarding vowels, are presented and discussed in this paper. Results show the strong performance of the new solutions, even when super short speech segments were used. The obtained quality improvement of frame recognition measured by frame error rate was about 4%.

5

Text Independent Automatic Speaker Recognition System using fusion of features

Majda-Zdancewicz E., Dobrowolski A. P.

Przegląd Elektrotechniczny

|

2015

|

R. 91, nr 10

247-251

EN

This paper presents a speaker recognition system, which is independent of the linguistic context. The solved task includes: the preprocessing stage, the segmentation of speech signal leading to the extraction of features based on three techniques, selection of the most important features, and the classification stage involving a serial combination of classifiers. Sets of descriptors were obtained using three techniques: cepstral coefficients, mel-cepstral coefficients and original weighted cepstral coefficients. Optimal robust “Voice Print” has been determined using fisher coefficients and PCA analysis. Experiments on the 2002 NIST Speaker Recognition Evaluation corpus show that the proposed system is able to recognise the speaker, regardless on the speech content, even language content with great accuracy.

PL

W pracy przedstawiono system rozpoznawania mówcy niezależny od tekstu wypowiedzi. Rozwiązane problemy obejmują: etap przetwarzania wstępnego, segmentację sygnału mowy prowadzącą do etapu ekstrakcji cech bazującej na trzech technikach analizy sygnału mowy, selekcję najbardziej istotnych cech oraz etap klasyfikacji obejmujący analizę kaskady klasyfikatorów. Zestaw cech uzyskano przy użyciu trzech technik: cepstrum, mel-cepstrum oraz autorskich ważonych cech cesptralnych. Optymalny wektor cech wyekstrahowano przy użyciu współczynników istotności Fishera oraz analizy PCA. Eksperymenty z wykorzystaniem bazy 2002 NIST Speaker Recognition Evaluation pokazują, że przedstawiony system rozpoznaje mówcę niezależnie od ograniczeń lingwistycznych treści, a nawet języka wypowiedzi, z zadowalającą dokładnością.

6

Przetwarzanie mowy w celu sterowania urządzeniami mechatronicznymi

Tarasiuk M., Gosiewski Z.

Mechanik

|

2015

|

R. 88, nr 7

572--575

PL

Przedstawiono etapy opracowania metody parametryzacji sygnałów mowy. Adaptowano dekompozycję paczkowej transformacji falkowej oraz zastosowano rozplot homomorficzny. Dzięki wykorzystaniu niejawnych modeli Markowa do rozpoznawania zweryfikowano działanie opracowanej metody. Badania stanowią punkt wyjścia do wdrożenia automatycznego systemu rozpoznawania mowy do sterowania urządzeniami mechatronicznymi.

EN

Illustrated are the steps to develop a method of speech parameterization. Adapted for the purpose was packet decomposition of the wavelet transformation with homomorphic deconvolution also applied. The hidden Markov Models for speech recognition as used were providing at the same time for verification of the developed method. These studies should be considered as the starting point for further implementation of an automatic speech recognition system for control of mechatronic devices.

7

Rozpoznawanie wieku i płci na podstawie analizy głosu

Gabryś J., Gil G., Kiszka P.

Acta Bio-Optica et Informatica Medica. Inżynieria Biomedyczna

|

2015

|

Vol. 21, nr 3

165--169

PL

Metody automatycznego rozpoznawania wieku i płci pozwalają na rozpoznanie cech osoby mówiącej tylko na podstawie nagrania jej wypowiedzi. Mowa ludzka, poza werbalnym komunikatem, niesie ze sobą informacje dotyczące osoby mówiącej. Nagranie mowy osoby pozwala na wyodrębnienie takich informacji, jak jej płeć, wiek, a także emocje. Zaprezentowano przegląd metod rozpoznawania wieku i płci osób na podstawie ich mowy oraz wykonano implementację i przetestowano połączenie metod wyznaczania parametrów MFCC (współczynniki analizy cepstralnej w skali mel (Mel-frequency Cepstral Coefficients) i wysokości tonu głosu f0 oraz algorytmu SVM (metoda wektorów nośnych - Support Vector Machines) do klasyfikacji próbek głosowych. Testy zaimplementowanego rozwiązania pozwalają stwierdzić, że metoda jest skuteczna w większości przypadków testowych.

EN

Methods for automatic recognition of the age and gender characteristics allow the identification of the person only on the basis of recording of this person speech. Human speech, beyond verbal communication, gives an information about the speaking person. Speech recording allows the identification personal characteristics such as gender, age, and the emotions. The paper presents an overview of methods of age and gender recognition of people based on their speech. A combination of methods for determining the parameters MFCC (Mel-frequency Cepstral Coefficients) and pitch of voice (f0) and SVM (Support Vector Machines) algorithm for the classification of voice samples is implanted and tested. It was demonstrated that the method is effective in the majority of test cases.

8

Automatic recognition of voice commands in a car cabin

Mięsikowska M., Ruiter de E.

Pomiary Automatyka Kontrola

|

2014

|

R. 60, nr 8

652--654

EN

Automatic speech recognition systems are applied in vehicles. It is possible to control a navigation system, an air conditioning system, a media player, and make phone calls using voice commands. The effectiveness of speech recognition systems depends largely on the acoustic conditions in the cabin of a vehicle. In contrast, the recognition accuracy, determines the ability to extend the functionality of the application of speech recognition systems, not only to the basic functions listed above, but also to control the systems that affect the movement of the vehicle. The work shows the preliminary results of research on speech recognition and evaluation of speech intelligibility in the cabin of the vehicle in the presence of noise barriers. These results may be helpful in assessing the speech intelligibility and the results of automatic speech recognition systems in the cabin of the vehicle.

PL

Systemy automatycznego rozpoznawania mowy są aplikowane w pojazdach. Za pomocą komend głosowych możemy sterować nawigacją, systemem klimatyzacji, odtwarzaczem multimediów, oraz wykonywać połączenia telefoniczne. Skuteczność systemów rozpoznawania mowy zależna jest w dużej mierze od warunków akustycznych panujących w kabinie pojazdu. Natomiast dokładność rozpoznawania, warunkuje możliwość rozszerzenia funkcjonalności stosowania systemów rozpoznawania mowy nie tylko do podstawowych funkcji wymienionych wyżej, ale także do sterowania układami mającymi wpływ na poruszanie się pojazdu. Praca pokazuje wstępne wyniki badań w zakresie rozpoznawania mowy oraz oceny zrozumiałości mowy w kabinie pojazdu w obecności ekranów akustycznych. Wyniki badań mogą okazać się pomocne w ocenie zrozumiałości mowy i rezultatów automatycznego rozpoznawania mowy w kabinie pojazdu.

9

Zastosowania systemów rozpoznawania mowy do sterowania i komunikacji głosowej z urządzeniami mechatronicznymi

Regulski R., Nowak A.

Pomiary Automatyka Robotyka

|

2013

|

R. 17, nr 2

467-474

PL

Artykuł przedstawia przykłady wykorzystania systemów automatycznego rozpoznawania mowy do budowy głosowych interfejsów typu człowiek-maszyna. W artykule opisano sposób działania takich aplikacji pod kątem sterowania i komunikacji głosowej. W następnej części przedstawiono koncepcję i budowę systemu rozpoznawania mowy do komunikacji z 32-bitowym modułowym sterownikiem pralki.

EN

This paper presents examples of the use of automatic speech recognition systems to build human-machine voice interfaces. Also this paper briefly describes how these applications can work. The rest of the article shows the concept of usage speech recognition system based on own driver which cooperate with washing machine controller.

10

Pipelined language model construction for Polish speech recognition

Sas J., Żołnierek A.

International Journal of Applied Mathematics and Computer Science

|

2013

|

Vol. 23, no. 3

649--668

EN

The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.

11

Evaluation of voice-based data entry to an electronic health record system for dentistry

Chleborad K., Zvara Jr. K., Dostalova T., Zvara K., Hippmann R., Ivancakova R., Zvarova J., Smidl L., Trmal J., Psutka J.

Biocybernetics and Biomedical Engineering

|

2013

|

Vol. 33, no. 4

204--210

EN

This paper compares three methods of storage data of the patients in the field of dentistry: the paper dental card, a lifetime dental EHR controlled by keyboard and a lifetime dental EHR controlled by voice. The EuroMISE Center developed a pilot EHR application called MUDR Lite (multimedia distributed electronic health record). The study compares the elapsed time necessary to update/enter the information about the patient's dental status using the above mentioned three methods. The paper dental card is the most rapid method, but not the best for medical documentation and dentists.

12

The new method of the inter-phonemes transitions finding

Dulas J.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 10a

135-138

EN

This article describes the new method of the inter-phonemes transition finding based on the image recognition. Automatic borders between phonemes finding is the same as the number of phonemes finding. This is an important factor used in Automatic Speech Recognition systems.

PL

Artykuł przedstawia nową metodę lokalizacji przejść międzyfonemowych opartą o analizę obrazów. Automatyczne określenie miejsc przejść międzyfonemowych jest równoznaczne z określeniem liczby fonemów występujących w danym wyrazie. Jest to ważny parametr wykorzystywany w systemach automatycznej identyfikacji sygnałów mowy. (Nowa metoda lokalizacji przejść międzyfonemowych).

13

Pitch period’s properties and the new method used for finding them

Dulas J.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 7a

297-300

EN

This article describes the pitch’s periods interesting properties. These periods are included in each vowel and voiced consonant. It also describes the new method of pitch period finding and their duration counting. These parameters are very important elements of the automatic speech recognition algorithm worked out by the author.

PL

Artykuł przedstawia interesujące właściwości okresów podstawowych tonu krtaniowego występującego we wszystkich samogłoskach i spółgłoskach dźwięcznych oraz nową metodę ich odnajdywania i wyznaczania ich długości. Poprawne odnajdywanie okresów podstawowych i wyznaczanie czasu ich trwania jest ważnym elementem algorytmu automatycznej identyfikacji słów opracowanego przez autora.

14

Building compact language models for medical speech recognition in mobile devices with limited amount of memory

Sas J.

Journal of Medical Informatics & Technologies

|

2012

|

Vol. 20

111--119

EN

The article presents the method of building compact language model for speech recognition in devices with limited amount of memory. Most popularly used bigram word-based language models allow for highly accurate speech recognition but need large amount of memory to store, mainly due to the big number of word bigrams. The method proposed here ranks bigrams according to their importance in speech recognition and replaces explicit estimation of less important bigrams probabilities by probabilities derived from the class-based model. The class-based model is created by assigning words appearing in the corpus to classes corresponding to syntactic properties of words. The classes represent various combinations of part of speech inflectional features like number, case, tense, person etc. In order to maximally reduce the amount of memory necessary to store class-based model, a method that reduces the number of part-of-speech classes has been applied, that merges the classes appearing in stochastically similar contexts in the corpus. The experiments carried out with selected domains of medical speech show that the method allows for 75% reduction of model size without significant loss of speech recognition accuracy.

15

Szybka metoda identyfikacji fonemów szumowych występujących w cyfrach wypowiadanych w języku polskim

Dulas J.

Przegląd Elektrotechniczny

|

2011

|

R. 87, nr 2

242-245

PL

Niniejszy artykuł jest sprawozdaniem z zakończonego, kolejnego etapu prac autora nad stworzeniem systemu automatycznej identyfikacji cyfr wypowiadanych w języku polskim. Przedstawia on metodę automatycznego rozpoznawania fonemów szumowych przetestowaną na 100 nagraniach cyfr "trzy" i "cztery" pochodzących od mówców różnej płci i w różnym wieku.

EN

This article is the coverage from the last, finished author's research aiming to build automatic speech recognition system for digits spoken in polish. It describes the method of automatic noisy phonemes recognition which was tested on 100 records of digit 3 and 4 received from speakers of different sex and age.

16

Automatyczne rozpoznawanie cyfr w języku polskim - identyfikacja fonemów szumowych

Dulas J.

Przegląd Elektrotechniczny

|

2011

|

R. 87, nr 1

280-283

PL

Artykuł opisuje kolejny etap badań prowadzonych przez autora, zmierzających do stworzenia systemu umożliwiającego automatyczną identyfikację i sterowanie za pomocą cyfr wypowiadanych w języku polskim. Przedstawiono tu nową metodę odnajdywania fonemów szumowych. W trakcie badań wykorzystywana jest baza nagrań Corpora opracowana na Politechnice Poznańskiej uzupełniona o własne nagrania.

EN

The article describes next steps of the author's research which goal is building the automatic speech recognition and control system for Polish. The new method of noisy phonemes finding is presented. In the author's research the Corpora data base, made by Poznan Technical University scientists and own records are used.

17

Automatic word's identification algorithm used for digits classification

Dulas J.

Przegląd Elektrotechniczny

|

2011

|

R. 87, nr 11

230-233

EN

This article describes the results of some years of research into automatic digits' identification algorithm for Polish. The new method based on the image recognition received from time characteristics gives better results than well known frequency domain analyses.

PL

Artykuł przedstawia efekt kilkuletnich prac autora nad stworzeniem algorytmu automatycznej identyfikacji cyfr wypowiadanych w języku polskim. Nowatorska metoda wykorzystująca analizę obrazów otrzymanych z charakterystyk czasowych wypowiedzi pozwala na osiągnięcie lepszych rezultatów niż stosowane powszechnie analizy widmowe.

18

Automatyczna segmentacja sygnałów mowy w oparciu o metodę siatek o zmiennych parametrach

Dulas J.

Przegląd Elektrotechniczny

|

2010

|

R. 86, nr 1

229-232

PL

Artykuł przedstawia nową metodę segmentacji sygnału mowy opracowaną przez autora i przetestowaną na zbiorze 50-ciu nagrań pochodzących od osób różnej płci i w różnym wieku. Metoda ta bazuje na rozpoznawaniu obrazów uzyskanych z analizy charakterystyk czasowych tych nagrań.

EN

The article describes a new method of the speech signal segmentation. This method was worked out by the author and tested on 50 records come from people different age and different sex. It is based on the time characteristic’s image recognition.

19

Automatyczna identyfikacja cyfr dla mówców polskojęzycznych

Dulas J.

Przegląd Elektrotechniczny

|

2010

|

R. 86, nr 5

15-18

PL

Artykuł przedstawia aktualny stan prac autora nad wdrożeniem systemu automatycznej identyfikacji cyfr wypowiadanych w języku polskim. Pokazuje również obecnie wykorzystywane techniki rozpoznawania mowy w innych językach oraz osiągane tam rezultaty. W trakcie badań wykorzystywana jest baza nagrań Corpora opracowana na Politechnice Poznańskiej.

EN

The article describes current progress in implementation of the automatic digits recognition system for polish design by the author. It also shows different ways of speech recognition used for the other languages and results of their applications. In the author’s research the Corpora data base, made by Poznan Technical University scientists, is used.

20

Optimal spoken dialog control in hands-free medical information systems

Sas J.

Journal of Medical Informatics & Technologies

|

2009

|

Vol. 13

113--120

EN

In the paper a method of optimal selection of utterances used as command entry-words for voice controlled application is presented. Voice controlled programs seem to be particularly useful in the area of medical informatics, where a physician interacts with a program by voice while operating the medical device or being involved in examinations requiring manual activities. The proposed method selects command words from sets of proposals defined for each command so as to minimize the overall probability of incorrect command recognition. First the entry-word dissimilarity matrix is calculated. The word dissimilarities are evaluated using HMM models consisting of appropriately trained acoustic models of the phonemes constituting words. The trained HMM is used as the sample utterance generator for the word. The artificially created utterance samples are then recognized by speech recognizers created for pairs of words. The estimation of correct recognition probability is used as the word dissimilarity measure. The word dissimilarities are then used to determine the average assessment of words selections that can be used as commands. Selection is created by choosing single word from sets of candidates defined for each command. Finally, suboptimal selection is found by using genetic algorithm. Experiments carried out prove that suboptimal selection of command entry-words can observably increase the accuracy of spoken commands recognition in many cases.