Wyniki wyszukiwania - Biblioteka Nauki

Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

Ograniczanie wyników

Znaleziono wyników: 3

Liczba wyników na stronie

Wyniki wyszukiwania

Sortuj według:

Ogranicz wyniki do:

Using frequent pattern mining algorithms in text analysis

100%

Ożdżyński P. , Zakrzewska D.

2017

tom Vol. 6, No. 3

213--222

In text mining, effectiveness of methods depends on document representations. The ones based on frequent word sequences are used in such tasks as categorization, clustering and topic modelling. In the paper a comparison of different algorithms for finding frequent word sequences is presented. There are considered techniques dedicated for market basket analysis such as GSP and PrefixSpan as well as a method based on a suffix array. The investigated techniques are compared with the new approach of searching maximum frequent word sequences in document sets. Performance of the algorithms is examined taking into account execution times for the considered test collections.

A search of significant phrases for building topic models in text documents

100%

Ożdżyński P. , Zakrzewska D.

tom Vol. 5, No. 2

205--214

A huge amount of documents in the digitalized libraries requires efficient methods for exploring contained there information. ìTopic modelingî is considered as one of the most effective among them. In spite of commonly used approaches for finding occurrences of single words, in the paper building topic models based on phrases is pondered. We propose a methodology, which enables to create a set of significant word sequences and thus limiting the search area to phrases which contain them. The methodology is evaluated on experiments performed on real text datasets. Obtained results are compared with those received by using LDA algorithm.

Adaptive information extraction from structured text documents

100%

Ożdżyński P. , Zakrzewska D.

tom Vol. 3, No. 4

261--272

Effective analysis of structured documents may decide on management information systems performance. In the paper, an adaptive method of information extraction from structured text documents is considered. We assume that documents belong to thematic groups and that required set of information may be determined ”apriori”. The knowledge of document structure allows to indicate blocks, where certain information is more probable to appear. As the result structured data, which can be further analysed are obtained. The proposed solution uses dictionaries and flexion analysis, and may be applied to Polish texts. The presented approach can be used for information extraction from official letters, information sheets and product specifications.