Wyniki wyszukiwania - BazTech

1

A study of parallel techniques for dimensionality reduction and its impact on the quality of text processing algorithms

Pietroń M., Wielgosz M., Karwatowski M., Wiatr K.

Measurement Automation Monitoring

|

2015

|

Vol. 61, No. 7

352--353

EN

The presented algorithms employ the Vector Space Model (VSM) and its enhancements such as TFIDF (Term Frequency Inverse Document Frequency) with Singular Value Decomposition (SVD). TFIDF were applied to emphasize the important features of documents and SVD was used to reduce the analysis space. Consequently, a series of experiments were conducted. They revealed important properties of the algorithms and their accuracy. The accuracy of the algorithms was estimated in terms of their ability to match the human classification of the subject. For unsupervised algorithms the entropy was used as a quality evaluation measure. The combination of VSM, TFIDF, and SVD came out to be the best performing unsupervised algorithm with entropy of 0.16.

2

Bag of Words : Quality Issues of Near-Duplicate Image Retrieval

Paradowski M., Durak M., Broda B.

Machine Graphics and Vision

|

2014

|

Vol. 23, No. 1/2

83--96

EN

This paper addresses the problem of large scale near-duplicate image retrieval. Issues related to visual words dictionary generation are discussed. A new spatial verification routine is proposed. It incorporates neighborhood consistency, term weighting and it is integrated into the Bhattacharyya coefficient. The proposed approach reaches almost 10% higher retrieval quality, comparing to other recently reported state-of-the-art methods.

3

Document Clustering : Concepts, Metrics and Algorithms

Tarczynski T.

International Journal of Electronics and Telecommunications

|

2011

|

Vol. 57, No. 3

271-277

EN

Document clustering, which is also refered to as text clustering, is a technique of unsupervised document organisation. Text clustering is used to group documents into subsets that consist of texts that are similar to each orher. These subsets are called clusters. Document clustering algorithms are widely used in web searching engines to produce results relevant to a query. An example of practical use of those techniques are Yahoo! hierarchies of documents [1]. Another application of document clustering is browsing which is defined as searching session without well specific goal. The browsing techniques heavily relies on document clustering. In this article we examine the most important concepts related to document clustering. Besides the algorithms we present comprehensive discussion about representation of documents, calculation of similarity between documents and evaluation of clusters quality.

4

Analiza skupień i redukcja wymiarowości w hierarchicznym modelu korpusowym języka

Wicijowski J., Ziółko B.

Studia Informatica

|

2010

|

Vol. 31, nr 2A

133-145

PL

Przedstawiono model semantyczny języka polskiego pochodzący z obróbki materiału językowego z polskiej Wikipedii. Model służy weryfikacji hipotez zdaniowych w systemie automatycznego rozpoznawania mowy. Przedstawiono metody filtracji i klasteryzacji dokumentów w celu przyśpieszenia obliczeń. Autorzy kładą nacisk na oddelegowaniu zadań do silnika bazy danych tam, gdzie jest to pożądane ze względu na szybkość.

EN

The article presents a semantic model of the polish language based on the polish Wikipedia texts. The model is a part of an automatic speech recognition system and verifies sentences hypotheses. Methods of filtering and clustering of the documents, which aim to accelerate the computations, are presented. The authors emphasize the delegation of the processing tasks to the database engine, where it is possible to gain the performance.