Wyniki wyszukiwania - BazTech

1

Czech parliament meeting recordings as ASR training data

Krůza Jan Oldřich

Annals of Computer Science and Information Systems

|

2020

|

Vol. 21

185--188

EN

I present a way to leverage the stenographed recordings of the Czech parliament meetings for purposes of training a speech-to-text system. The article presents a method for scraping the data, acquiring word-level alignment and selecting reliable parts of the imprecise transcript. Finally, I present an ASR system trained on these and other data.

2

Zastosowanie algorytmów normalizacji tekstu na potrzeby syntezy mowy w urządzeniach przenośnych

Zacniewski A., Kleinszmidt M.

Biuletyn Wojskowej Akademii Technicznej

|

2018

|

Vol. 67, nr 2

89--97

PL

W artykule pokazano kolejne etapy występujące w syntezie mowy, a także sposoby postępowania z poszczególnymi fragmentami tekstu, który ma zostać przetworzony na mowę. Przedstawiono wyniki badań wydajności algorytmów normalizacji treści realizowanych na potrzeby projektu Toucan Eye - urządzenia przenośnego z systemem sztucznej inteligencji, mającego wspomóc osoby z dysfunkcją wzroku. Pokazano, jak istotne są dobranie i optymalizacja zastosowanych algorytmów ze strony implementacyjnej, po to by zwiększyć komfort użytkownika końcowego.

EN

The article presents consecutive stages of speech synthesis and also the ways of dealing with particular fragments of a text. The results of performance measurement for the text content normalization algorithms are shown. These algorithms were developed for the Toucan Eye project – an embedded device with an artificial intelligence system able to help people with impaired sight. It was shown how essential is the choice and optimization of the applied algorithms for the implementation process in order to increase the end-user’s comfort.

3

Badania szybkości i jakości metod syntezy mowy na potrzeby zastosowania w urządzeniu przenośnym

Zacniewski A., Zdunek R.

Biuletyn Wojskowej Akademii Technicznej

|

2018

|

Vol. 67, nr 2

99--108

PL

W artykule przeanalizowano szereg metod dotyczących syntezy mowy, mając na uwadze ich wykorzystanie w urządzeniu przenośnym. Badania realizowano na urządzeniach o zróżnicowanych parametrach, a badanymi kryteriami były skuteczność danej metody i jej szybkość. Badania są częścią projektu Toucan Eye - urządzenia przenośnego z systemem sztucznej inteligencji, mającego wspomóc osoby z dysfunkcją wzroku. Pokazano również, jak ważne jest zoptymalizowanie zastosowanych metod w fazie projektu inżynierskiego, w celu zapewnienia lepszej jakości pracy urządzenia i komfortu użytkownika końcowego.

EN

In the article, the methods concerning speech synthesis were analysed, having in mind their usage in an embedded device. Research was carried out on the devices with mixed parameters, and the criteria were accuracy and speed of the given method. The research is a part of the Toucan Eye project – an embedded device with an artificial intelligence system able to help people with impaired sight. It was shown how important is optimization of the applied methods in the phase of an engineer project to ensure better quality of a working device and the end-user’s comfort.

4

Voiceless Stop Consonant Modelling and Synthesis Framework Based on MISO Dynamic System

Korvel G., Kostek B.

Archives of Acoustics

|

2017

|

Vol. 42, No. 3

375--383

EN

A voiceless stop consonant phoneme modelling and synthesis framework based on a phoneme modeling in low-frequency range and high-frequency range separately is proposed. The phoneme signal is decomposed into the sums of simpler basic components and described as the output of a linear multiple-input and single-output (MISO) system. The impulse response of each channel is a third order quasi-polynomial. Using this framework, the limit between the frequency ranges is determined. A new limit point searching three-step algorithm is given in this paper. Within this framework, the input of the low-frequency component is equal to one, and the impulse response generates the whole component. The high-frequency component appears when the system is excited by semi-periodic impulses. The filter impulse response of this component model is single period and decays after three periods. Application of the proposed modelling framework for the voiceless stop consonant phoneme has shown that the quality of the model is sufficiently good.

5

Zastosowanie rozpoznawania mówcy w automatycznej translacji mowy typu speech-to-speech

Kłosowski P., Dustor A., Izydorczyk J

Studia Informatica

|

2014

|

Vol. 35, nr 3

71--81

PL

Przedstawiony artykuł dotyczy zagadnień związanych z funkcjonowaniem systemów automatycznej translacji mowy ciągłej. W systemach tych wykorzystuje się techniki przetwarzania języka naturalnego realizowane z wykorzystaniem algorytmów automatycznego rozpoznawania mowy, automatycznej translacji tekstów oraz zamiany tekstu na mowę za pomocą syntezy mowy. W artykule zaproponowano także metodę usprawnienia procesu automatycznej translacji mowy przez zastosowanie algorytmów automatycznej identyfikacji mówcy, pozwalających na automatyczną segmentację mowy pochodzącej od różnych mówców.

EN

This paper concerns the machine translation of continuous speech. These systems use machine language processing techniques implemented using algorithms of automatic speech recognition, automatic text translation and text-to-speech conversion using speech synthesis.

6

Improving speech processing based on phonetics and phonology of Polish language

Kłosowski P.

Przegląd Elektrotechniczny

|

2013

|

R. 89, nr 8

303--307

EN

The article presents methods of improving speech processing based on phonetics and phonology of Polish language. The new presented method for speech recognition was based on detection of distinctive acoustic parameters of phonemes in Polish language. Distinctivity has been assumed as the most important selection of parameters, which have represented objects from recognized classes. Speech recognition is widely used in telecommunications applications.

PL

W artykule zaprezentowano metody usprawnienia przetwarzania mowy wykorzystując do tego celu wiedzę z zakresu fonetyki I fonologii języka polskiego. Przedstawiona innowacyjna metoda automatycznego rozpoznawania mowy polega na detekcji akustycznych parametrów dystynktywnych fonemów mowy polskiej. O dystynktywności cech decydują parametry niezbędne do klasyfikacji fonemów.

7

System syntezy mowy polskiej z zastosowaniem platformy wbudowanej

Owczarek M., Poryzała P.

Elektronika : konstrukcje, technologie, zastosowania

|

2013

|

Vol. 54, nr 9

75-78

PL

W pracy opisano system do syntezy mowy zbudowany z wykorzystaniem 32-bitowego mikrokontrolera z rdzeniem ARM Cortex-M4. System umożliwia syntezę mowy na podstawie tekstu wprowadzonego przez użytkownika. Jako podstawę algorytmiczną mechanizmu syntezy mowy wykorzystano syntezator formantowy eSpeak (projekt o otwartym źródle) dla komputerów PC. Został on przeniesiony na wybraną platformę docelową, z uwzględnieniem istniejących ograniczeń oraz wymagań warstwy sprzętowej. Opracowano narzędzia realizujące konwersję plików danych programu eSpeak do postaci tablic wartości oraz struktur danych kompilowanych wraz z kodem programu. Napisano również procedury do niezależnej diagnostyki oraz weryfikacji działania każdego z elementów opracowanego systemu syntezy mowy.

EN

This paper describes speech synthesis system working on an embedded platform. The physical layer of the application was based on an efficient, 32-bit, ARM Cortex-M4 microcontroller. Since building of a complete Text-to-Speech system from scratch is a complex issue, elements of an open-source project called eSpeak (which uses formant synthesis, which does not require storage of large data structures) were ported onto the proposed target platform (with consideration of all of its limitations and requirements). The built system supports many languages and is capable of producing artificial speech directly from any text entered by the user.

8

Virtual keyboard controlled by eye gaze employing speech synthesis

Łopatka K., Rybacki R., Kunka B., Czyżewski A., Kostek B.

Elektronika : konstrukcje, technologie, zastosowania

|

2011

|

Vol. 52, nr 1

39-42

EN

The article presents the speech synthesis integrated into the eye gaze tracking system. This approach can significantly improve the quality of life of physically disabled people who are unable to communicate. The virtual keyboard (QWERTY) is an interface which allows for entering the text for the speech synthesizer. First, this article describes a methodology of determining the fixation point on a computer screen. Then it presents an algorithm of concatenative speech synthesis used in the engineered solution. Both modules of the system described were created by the Multimedia Systems Department. The work of the entire system was verified in real conditions. Conclusions focusing on the usefulness of this approach are provided.

PL

W artykule przedstawiono zastosowanie syntezy mowy w zintegrowanym w systemie śledzenia punktu fiksacji wzroku. Takie podejście w znaczący sposób może przyczynić się do poprawy jakości życia osób niepełnosprawnych fizycznie, które nie mają możliwości komunikowania się. Interfejsem umożliwiającym wprowadzanie do syntetyzera mowy tekstu jest wirtualna klawiatura z rozkładem klawiszy QWERTY. W pierwszej części artykułu przedstawiono sposób wyznaczania punktu fiksacji wzroku na monitorze komputerowym za pomocą stworzonego w Katedrze Systemów Multimedialnych systemu o nazwie Cyber-Oko. W drugiej części zaprezentowano algorytm syntezy mowy konkatenacyjnej, który jest wykorzystywany w zaproponowanym rozwiązaniu. Sprecyzowano odpowiednie wnioski na temat użyteczności takiego podejścia oraz zweryfikowano pracę systemu w warunkach rzeczywistych.

9

Automatic prosodic modification in a Text-To-Speech synthesizer of polish language

Łopatka K., Suchomski P., Czyżewski A.

Elektronika : konstrukcje, technologie, zastosowania

|

2011

|

Vol. 52, nr 5

106-110

EN

A Text-To-Speech synthesizer of Polish language with automatic prosodic modification is presented. The methods for automatic determination of accent and intonation are introduced. The application of prosodic speech processing algorithms to Text-To-Speech synthesis is presented. The impact of these modifications on the naturalness of the synthesized signal is discussed. The applied method is based on the TD-PSOLA algorithm. The developed Text-To-Speech Synthesizer is used in applications employing multimodal computer interfaces.

PL

Przedstawiono system syntezy mowy polskiej z funkcją automatycznej modyfikacji prozodii wypowiedzi. Opisane zostały metody automatycznego wyznaczania akcentu i intonacji wypowiedzi. Przedstawiono zastosowanie algorytmów przetwarzania sygnału mowy w procesie kształtowania prozodii. Omówiono wpływ zastosowanych modyfikacji na naturalność brzmienia syntezowanego sygnału. Zastosowana metoda poarta jest na algorytmie TD-PSOLA. Opracowany system syntezy mowy znajduje zastosowanie w aplikacjach wykorzystujących multimodalne interfejsy komputerowe.

10

Syntetyzer mowy uwzględniający prozodię wypowiedzi

Łopatka K., Czyżewski A.

Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej

|

2010

|

Nr 28

105-108

PL

Przedstawiono system syntezy mowy polskiej uwzględniający w sposób automatyczny prozodię, tj. profil intonacyjny, tempo i akcenty wypowiedzi. Zastosowano syntezę konkatenacyjną z wykorzystaniem jednostek mowy zawierających przejścia między dwoma głoskami – difonów. Opisano poszczególne moduły wchodzące w skład syntetyzera: przetwarzanie tekstu, bazę jednostek mowy oraz algorytmy związane z tworzeniem syntetyzowanego sygnału. Przeprowadzono testy subiektywne potwierdzające wysoką zrozumiałość generowanej mowy i skuteczność modyfikacji prozodycznych. Przedstawiono możliwość zastosowania opisanego systemu w aplikacjach edukacyjnych lub terapeutycznych oraz interfejsach multimodalnych przeznaczonych dla osób niepełnosprawnych.

EN

The paper presents a Text-To-Speech synthesizer of Polish language employing automatic prosodic modification. The method used for synthesizing the speech signal is concatenative synthesis using constant-length segments – diphones. The subsequent modules of the synthesizer are introduced. Employed language analysis and signal processing techniques are described. The synthesized speech yields high intelligibility and naturalness, which is proved by auditory tests. The proposed system can be used in educational and therapeutic applications or multimodal interfaces for disabled people.

11

System syntezy mowy polskiej do zastosowań w urządzeniach mobilnych

Barański P., Bronakowski Ł., Strumiłło P.

Elektronika : konstrukcje, technologie, zastosowania

|

2010

|

Vol. 51, nr 9

78-80

PL

W artykule omówiono wykonany system syntezy mowy polskiej. System umożliwia syntezę bezpośrednio z tekstu ortograficznego. W celu dokonania transkrypcji fonetycznej opracowano jednoznakowy alfabet fonetyczny (1 znak - 1 fonem). Synteza jest realizowana metodą korpusowej selekcji jednostek fonetycznych. Jako jednostki fonetyczne wykorzystano difony. Niektóre difony mają kilka instancji różniących się kontekstem występowania. Każde słowo może być więc zsyntezowanie na wiele sposobów. Sekwencja difonów dobierana jest za pomocą algorytmu Viterbiego w celu uzyskania najbardziej optymalnego zestawu jednostek fonetycznych, zapewniając w ten sposób większą naturalność generowanej mowy.

EN

The article describes a system for speech synthesis designated for polish language. The system converts text to speech by using simple transcription rules. Every phoneme corresponds to one transcription letter. The system applies the corpus-based method, which uses diaphones at its core. Some diaphones have several instances with different context of occurrence. Therefore, every word can be synthesized in many ways. The applied cost function estimates the quality of a given diaphone connection. The adjacent diaphones are compared in terms of spectral properties. The optimal sequence of diaphones is then singled out by applying the Werbi algorithm. This guarantees the minimal cost value, which reflects the best possible quality of the synthesized speech.

12

Implementation of Polish speech synthesis for the BOSS system

Demenko G., Mobius B., Klessa K.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2010

|

Vol. 58, nr 3

371-376

EN

The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The subsequent sections focus on the implementation of Polish TTS modules in the BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

13

Design of text to speach synthesis system based on the harmonic and noise model

Sawicki A., Zubrycki P., Petrovsky A.

Zeszyty Naukowe Politechniki Białostockiej. Informatyka

|

2009

|

Z. 4

111-125

EN

This is a proposal of concatenative text to speech synthesizer for the Polish language, based on diphones and ”Harmonics and Noise Model”(HNM). HNM has been successfully applied on a speech encoder and decoder, resulting in a high-quality of processed speech at low bit rate. Applying this model to speech synthesis system allows obtaining good quality of synthesized speech, and the small size of database parameters. The proposed project consists of two main modules. The Natural Language Processing (NLP) is used to analyse and convert the written text for phonemes and diphones using morphological rules. NLP discovers at the same time prosodic features for later modification of synthesized speech parameters in order to obtain the stress and voice intonation. The second section is a synthesis system, derived from speech decoder, preceded by a system of adapting the parameters of speech based on prosodic rules. The system of speech synthesis from the parameters is working in the frequency domain and uses the frequency spectrum envelope, which easily allows modifying the frequency, amplitude and duration of the signal when applying the prosodic rules. The algorithm of continuous phase designation at the speech frame borders allows concatenating portions of synthesized speech and diphones without phase distortion on the merger. Speech synthesizer operates on the diphone database, created applying fragmentation of recorded speech signal representing the pairs of phonemes. Sounds related to diphones are analyzed by speech encoder. It provides the parameters that described harmonic and noise components of speech, using the linear prediction filter LSF coefficients, resulting in a small size of diphone database.

PL

Artykuł przedstawia projekt konkatenacyjnego syntezatora mowy z tekstu dla języka polskiego, opartego na difonach i modelu Harmoniczne i Szum. Model Harmoniczne i Szum został z powodzeniem zastosowany w układzie kodera i dekodera mowy, dając w rezultacie dobrą jakość przetwarzanej mowy przy niskiej przepływności bitowej. Zastosowanie tego modelu do układu syntezy mowy pozwala na uzyskanie dobrej jako sci syntezowanej mowy, oraz niewielki rozmiar bazy parametrów. Układ składa się z dwóch głównych modułów. Moduł Naturalnego Przetwarzania Języka służy do analizy i zamiany tekstu pisanego na fonemy oraz difony, przy wykorzystaniu reguł morfologicznych. Procesor tekstu wyznacza jednocześnie warunki prozodii związane z późniejszą modyfikacją parametrów syntezowanego głosu w celu uzyskania akcentowania i intonacji. Drugim układem jest moduł syntezy, oparty na dekoderze mowy poprzedzonym systemem adaptacji parametrów mowy w oparciu o wyznaczone wcześniej reguły prozodyczne. Układ syntezy mowy z parametrw działa w dziedzinie czstotliwości i bazuje na obwiedni spektrum, co w prosty sposób pozwala na modyfikację czstotliwości, amplitudy i czasu trwania sygnału przy stosowaniu reguł prozodycznych. Algorytm wyznaczania ciągłej fazy na granicach ramek sygnału mowy pozwala na łączenie fragmentów syntezowanej mowy oraz poszczególnych difonów bez zniekształceń fazowych na połączeniu. Syntezator mowy operuje na bazie difonów, stworzonej na podstawie fragmentaryzacji nagranego sygnału mowy na części, reprezentujące połączenia par fonemów. Dźwięki odpowiadające difonom są analizowane przez moduł analizy mowy. Dostarcza on ciąg parametrów reprezentujących harmoniczne i szumowe komponenty sygnału mowy, opisane za pomocą filtrów liniowej predykcji i współczynników LSF, dając w rezultacie niewielkiej wielkości baze difonów.

14

Development of artificial neural network based speech synthesis for the polish language

Kwolek M.

Czasopismo Techniczne. Mechanika

|

2008

|

R. 105, z. 3-M

141-147

EN

The paper describes an MLP network that learns to transcribe Polish text to phonemes and defines the process of transcription. The transcription scheme used is SAMPA for the Polish language. The paper also shows mapping of text to binary patterns and the whole process of adaptation patterns for network's requirements. It describes learning process, and learning patterns were provided by professor Krzysztof Marasek from the Polish-Japanese Institute of Information Technology.

PL

W niniejszym artykule opisano wykorzystanie sztucznej sieci neuronowej MLP do zamiany tekstu pisanego w języku polskim na fonemy. Zdefiniowano sposób przeprowadzenia transkrypcji fonetycznej. Schemat transkrypcji oparty jest na alfabecie fonetycznym SAMPA dla języka polskiego. Przedstawiono proces przystosowania próbek tekstowych dla potrzeb sieci, czyli zamiany na postać binarną oraz generowanie okna. Opisano również proces uczenia sieci, a jako dane uczące wykorzystano bazę profesora Krzysztofa Maraska z Polsko-Japońskiej Wyższej Szkoły Technik Komputerowych.

15

System dialogowy języka mówionego : przegląd problemów

Wiśniewski A. M.

Biuletyn Instytutu Automatyki i Robotyki

|

2007

|

R. 13, nr 24

97-122

PL

Przedstawiono strukturę systemu dialogowego języka mówionego. Scharakteryzowano pożądane własności składników funkcjonalnych systemu: urządzenia rozpoznawania mowy, procesora językowego, sterownika (menedżera) dialogu i syntezatora mowy. Scharakteryzowano przykładowe realizacje systemów dialogowych języka mówionego.

EN

In this paper, the structure of a spoken language dialogue system was described. The underlying human language technologies were described: automatic speech recognizer, natural language understanding, dialogue manager, and speech synthesizer. The recent progress in spoken dialogue systems and some of the ongoing research challenges were presented.

16

Using casual speech phonology in synthetic speech

Shockey L.

Archives of Acoustics

|

2007

|

Vol. 32, No. 1

101-109

EN

Alphabetic writing is a mixed blessing for speech science. Most scientists working in speech synthesis and speech recognition assume unconsciously that spoken language is like written language, i.e. it is composed of a string of items (letters/phonemes) which should be realised in all but substandard writing/speech. My research shows that there are very many shortcuts taken by speakers of English on a regular basis in normal (not sloppy or casual) speech. These are not included in speech synthesis packages, but if they were, the output would be closer to the real thing and, I contend, would be considerably easier to understand.

17

Synthesis of fundamental frequency contours for Standard Chinese based on superpositional and tone nucleus models

Hirose K., Sun Q., Minematsu N.

Archives of Acoustics

|

2007

|

Vol. 32, No. 1

41-50

EN

A method for generating sentence F0 contours of Standard Chinese speech is developed. It is based on superposing tone components on phrase components in logarithmic frequency. While tone components are language specific, phrase components are assumed to be more language universal. Taking this situation into account, the method treats two kinds of components differently. The tone components are generated by concatenating F0 patterns of tone nuclei, which are predicted by a corpus-based scheme, while the phrase components are generated by rules. Experiments on F0 contour generation were conducted using 100 news utterances by a female speaker. First experiments were conducted on the generation of tone components, with phrase components of the original utterances being used unchanged. The results showed that the method could generate F0 contours close to those of target speech. Speech synthesis was conducted by substituting original F0 contours to generated ones by TD-PSOLA. A high score 4.5 in 5-point scale was obtained on average as the result of listening experiments on the quality of synthetic speech. Second experiments were on the generated phrase components, with the tone components extracted from the original utterances. Although the synthetic speech with generated F0 contours sounded mostly natural, there were occasional "degraded sounds", because of mismatch between the phrase and the tone components. To cope with the mismatch, a two-step method was developed, where information of the phrase contours was used for the prediction of tone components. Validity on the method was shown through perceptual experiments on synthesized speech.

18

Transcription-based automatic segmentation of speech

Szymański M., Grocholewski S.

Archives of Control Sciences

|

2005

|

Vol. 15, no. 3

461--468

EN

The important element of today's speech systems is the set of recorded wavefiles annotated by a sequence of phonemes and boundary time-points. The manual segmentation of speech is a very laborious task, hence the need for automatic segmenation algorithms. However, the manual segmentation still outperforms the automatic one and at the same time the quality of resulting synthetic voice highly depends on the accuracy of the phonetic segmentation. This paper describes our methodology and implementation of automatic speech segmentation, emphasizing its new elements.

19

Blue Voice - a specialized speech-aid system for hardly speaking people

Dąbrowski A., Kardyś P., Wysokiński D., Kwapisz P., Dąbrowski M., Koziorowski P.

Foundations of Control and Management Sciences

|

2004

|

No. 1

21-33

EN

This article presents a specialized speech-aid system meant for people with impairment of speech organs, who are either mute or whose speech is hardly intelligible due to surgical operations involving partial or full laryngectomy. Our experimental, prototype system, called "Blue Voice", enables these people to communicate with the society again, thus is bringing them back to the normal life and activity. Blue Voice is named after the Bluetooth wireless telecommunication standard, which is used for communication of the Blue Voice components. Our system is a modern electronic (digital) alternative to various classical and rather cumbersome mechanic-pneumatic devices, which are still commonly used for improvement of intelligibility of esophageal talkers.

20

Wybór jednostek elementarnych dla systemów syntezy mowy

Fabian P.

Studia Informatica

|

2003

|

Vol. 24, nr 4

29-38

EN

Two common methods of speech synthesis arę parametric synthesis and concatenation of basie speech units. Concatenation sticks speech units together in s selected domain. The ąuality of the speech synthesis grows with the length of basie ?, speech units in the vocabulary: one of possible solutions would be ideally to record a ^ large corpus of continuous speech. Collecting a set of elementary speech units, like polyphones, makes possible to use the second method for the Polish language. Speech 'synthesis isnot anewproblem, there arę many commercial products. But the ąuality ?.pf them for less popular languages, like Polish, is much worse than for the most English. The presented approach makes possible a fast optimization of a i units database for speech synthesis.

PL

Główne metody syntezy mowy to metody parametryczne z interpolacją parametrów i konkatenacyjne z zestawianiem wypowiedzi w wybranej dziedzinie z fragmentów istniejących nagrań. Zestawianie daje tym lepsze efekty, im dłuższe są jednostki w odpowiednich kontekstach. Zgromadzenie odpowiednio dużej bazy elementarnych nagrań (polifonów) pozwala zastosować drugą metodę do syntezy mowy w języku polskim. Jakość istniejących syntezatorów dla mniej popularnych języków, np. polskiego, jest znacznie niższa niż uzyskana dla najczęściej badanego języka angielskiego. Przedstawiona koncepcja automatycznego budowania bazy polifonów pozwala na szybką optymalizację bazy jednostek fonetyczno-akustycznych do celów syntezy mowy.