Analiza skupień i redukcja wymiarowości w hierarchicznym modelu korpusowym języka

Wicijowski, J.; Ziółko, B.

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Analiza skupień i redukcja wymiarowości w hierarchicznym modelu korpusowym języka

Autorzy

Wicijowski J. , Ziółko B.

Identyfikatory

Warianty tytułu

Cluster analysis and dimensionality reduction in a hierarchical corpus model

Języki publikacji

Abstrakty

Przedstawiono model semantyczny języka polskiego pochodzący z obróbki materiału językowego z polskiej Wikipedii. Model służy weryfikacji hipotez zdaniowych w systemie automatycznego rozpoznawania mowy. Przedstawiono metody filtracji i klasteryzacji dokumentów w celu przyśpieszenia obliczeń. Autorzy kładą nacisk na oddelegowaniu zadań do silnika bazy danych tam, gdzie jest to pożądane ze względu na szybkość.

The article presents a semantic model of the polish language based on the polish Wikipedia texts. The model is a part of an automatic speech recognition system and verifies sentences hypotheses. Methods of filtering and clustering of the documents, which aim to accelerate the computations, are presented. The authors emphasize the delegation of the processing tasks to the database engine, where it is possible to gain the performance.

Słowa kluczowe

analiza skupień model przestrzeni wektorowej macierz dokument-temat macierz rzadka sqlite3 Wikipedia mediawiki

cluster analysis vector space model dokument-term matrix sparse matrix sqlite3 Wikipedia mediawiki

Wydawca

Wydawnictwo Politechniki Śląskiej

Czasopismo

Studia Informatica

Rocznik

2010

Tom

Vol. 31, nr 2A

Strony

133--145

Opis fizyczny

Bibliogr. 10 poz.

Twórcy

autor

Wicijowski J.

autor

Ziółko B.

Akademia Górniczo-Hutnicza, Katedra Elektroniki, jan.wicijowski@agh.edu.pl

Bibliografia

1. Ziółko B., Manandhar S., Wilson R.C.: Bag-of-words modelling for speech recognition. 2009 International Conference on Future Computer and Communication. ICFCC 2009, Kwiecień 2009, s. 646-650.
2. Salton G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1989.
3. Salton G., Buckley C: Term-weighting approaches in automatic text retrieval. Information Processing and Management, 1988, s. 513-523.
4. Jones E., Oliphant T., Peterson P. et.al.: SciPy: Open source scientific tools for Python. SciPy Documentation: Sparse matrices. http://www.scipy.org/
5. Martinez W.L., Martinez A.R.: Exploratory Data Analysis with MATLAB (Computer Science and Data Analysis). Chapman & Hall/CRC, 2004.
6. Deerwester S., Dumais S.T., Furnas G.W., Landauer T.K., Harshman R.: Indexing by latent semantic analysis. Journal of the American Society For Information Science, 41, 1990.
7. Kohonen T.: Self-Organizing Maps. Springer-Verlag, Berlin 1995/1997.
8. Ntoulas A., Cho J., Olston C: What’s new on the web? : the evolution of the web from a search engine perspective. WWW ‘04: Proceedings of the 13th intemational conference on World Wide Web, New York, NY, USA, 2004. ACM, s. 1-12.
9. The Mathworks. Matlab Code Vectorization Guide. http://www.mathworks.eom/support/tech-notes/1100/1109.html
10. Jones E., Oliphant T., Peterson P. et.al.: SciPy: Open source scientific tools for Python. SciPy Documentation: A beginners guide to using Python for performance computing. http://www.scipy.org/PerformancePython

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSL7-0046-0017