Comparative study of SQLite and Berkeley DB implementations of n-gram model of polish language

Skurzok, D.; Ziółko, B.; Pohl, A.; Jadczyk, T.; Mąsior, M.

Artykuł - szczegóły

Tytuł artykułu

Comparative study of SQLite and Berkeley DB implementations of n-gram model of polish language

Autorzy

Skurzok D. , Ziółko B. , Pohl A. , Jadczyk T. , Mąsior M.

Identyfikatory

Warianty tytułu

Porównawcze studium implementacji modelu n-gramowego języka polskiego w SQLite i Berkeley DB

Języki publikacji

Abstrakty

Aspects of applying databases in computational linguistics are presented. An example of a dictionary and an n-gram model of the AGH automatic speech recognition system is depicted as well. An advantage of Berkeley DB, comparing to SQLite in time efficiency aspect is shown on this case.

Przedstawiono zagadnienia dotyczące stosowania baz danych w lingwistyce komputerowej. Omówiono także przykład słownika i modelu n-gramowego systemu rozpoznawania mowy AGH. Pokazano na tym przykładzie znaczącą przewagę implementacji wykonanej w Berkeley DB nad implementacją SQLite w sensie wydajności czasowej.

Słowa kluczowe

speech recognition natural language processing dictionary

rozpoznawanie mowy przetwarzanie języka naturalnego słownik

Wydawca

Wydawnictwo Politechniki Śląskiej

Czasopismo

Studia Informatica

Rocznik

2012

Tom

Vol. 33, nr 2B

Strony

153--162

Opis fizyczny

Bibliogr. 25 poz.

Twórcy

autor

Skurzok D.

autor

Ziółko B.

autor

Pohl A.

autor

Jadczyk T.

autor

Mąsior M.

Akademia Górniczo-Hutnicza, Katedra Elektroniki, skurzok@agh.edu.pl

Bibliografia

1. Ziółko B., Ziółko M.: Przetwarzanie mowy. Wydawnictwa AGH, Kraków 2011.
2. Ziółko M., Gałka J., Ziółko B., Jadczyk T., Skurzok D., Mąsior M.: Automatic Speech Recognition System Dedicated for Polish. Show and tell session, Interspeech, 2011.
3. Dijkstra E. W.: A Note on Two Problems in Connexion with Graphs. Numerische Mathematik, 1959.
4. Ziółko B., Skurzok D.: N-grams model for Polish. Speech and Language Technologies, Book 2, InTech Publisher, 2011.
5. Mąsior M., Ziółko B., Skurzok D., Jadczyk T.: Baza danych słownika języka polskiego ze statystykami słów dla systemu automatycznego rozpoznawania mowy (eng. A database of Polish dictionary with word statistics for automatic speech recognition). Studia Informatica, Vol. 32, No. 2B(97), Wydawnictwo Politechniki Śląskiej, Gliwice 2011, p. 349÷357.
6. Ziółko B., Skurzok D., Michalska M.: Polish n-grams and their correction process. The 4th International Conference on Multimedia and Ubiquitous Engineering, 2010.
7. Leavitt N.: Will NoSQL databases live up to their promise? Computer 43(2), 2010, p. 12÷14.
8. Stonebraker M.: SQL databases v. NoSQL databases. Communications of the ACM, 53(4), 2010, p. 10÷11.
9. Tudorica B., Bucur C.: A comparison between several NoSQL databases with comments and notes. 10th IEEE RoEduNet International Conference, 2011, p. 1÷5.
10. Pinkerton B.: Finding what people want: Experiences with the WebCrawler. The Second International World Wide Web Conference, Vol. 94, Chicago 1994, p. 17÷20.
11. Ide N., Veronis J.: Text encoding initiative: Background and contexts. Kluwer Academic Publishing, 1995.
12. Daciuk J.: Incremental Construction of Finite-State Automata and Transducers, and their Use in the Natural Language Processing, 1998.
13. Woliński M.: Morfeusz - a practical tool for the morphological analysis of Polish. Intelligent information processing and web mining, 2006, p. 511÷520.
14. Przepiórkowski A.: Korpus IPI PAN. Wersja wstępna. Instytut Podstaw Informatyki PAN, 2004.
15. Kubis M.: An Access Layer to PolNet-Polish WordNet. Human Language Technology. Challenges for Computer Science and Linguistics, 2011, p. 444÷455.
16. Woliński M.: A Relational Model of Polish Inflection in Grammatical Dictionary of Polish. Human Language Technology. Challenges of the Information Society, 2009, p. 96÷106.
17. Piasecki M., Szpakowicz S., Broda B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław 2009.
18. Pohl A.: The Semi-automatic Construction of the Polish Cyc Lexicon. Investigationes Linguisticae, 21, 2010.
19. Owens M.: The denitive guide to SQLite. A Press, 2006.
20. Olson M., Bostic K., Seltzer M.: Berkeley DB. Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference, 1999, p. 183÷192.
21. Horak A., Pala K., Rambousek A., Povolny M.: DEBVisDic - First Version of New Client-Server Wordnet Browsing and Editing Tool. Proceedings of the Third International WordNet Conference, 2006, p. 325÷328.
22. Dorosz K.: Usage of Dedicated Data Structures for URL Databases in a Large-scale Crawling. Computer Science, 10, 2009, p. 7÷17.
23. Hirsimaki T., Pylkkonen J., Kurimo M.: Importance of high-order n-gram models in morph-based speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 17(4), 2009, p. 724÷732.
24. Whittaker E., Woodland P.: Language modelling for Russian and English using words and classes, Computer Speech and Language, 17, 2003, p. 87÷104.
25. Przepiórkowski A., Górski R. L., Łaziński M., Pęzik P.: Recent Developments in the National Corpus of Polish. Proceedings of LREC, 2010.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSL2-0026-0074