Wyniki wyszukiwania - BazTech

1

Mechanizm identyfikacji i klasyfikacji treści

Niewiarowski A., Stanuszek M.

Studia Informatica

|

2013

|

Vol. 34, nr 2B

205--222

PL

Artykuł opisuje mechanizm identyfikacji i klasyfikacji treści, oparty na metodzie ważenia terminów, bazującej na odwrotnej częstości dokumentowej, częstości wystąpienia terminu i odległości Levenshteina. Zaproponowany mechanizm zaimplementowano w program analizujący tematy i opisy prac dyplomowych, w celu automatycznego doboru promotorów i recenzentów.

EN

This paper presents the mechanism of identification and classification of content, based on terms weighted method with inversed document frequency analysis and Levenstein distance technique. The proposed mechanism is applied in the analysis of topics and descriptions of selected diploma thesis, to automatic selection of supervisors and reviewers.

2

Analiza rozmieszenia wyrazów w zdaniach w celu detekcji czasowników

Niewiarowski A.

Studia i Materiały Informatyki Stosowanej

|

2012

|

nr 6

43--46

PL

Artykułprzedstawia analizęwyników próby detekcji czasowników wyprowadzonych przez mechanizm typu text mining, oparty o model cech wyrazów w zdaniach, bazujący na strukturze relacyjnej bazy danych. Podjęta została próba stworzenia mechanizmu wykrywającego czasowniki w oparciu o ich rozmieszczenie statystyczne w zdaniach. W artykule przeanalizowane zostały dokumenty tekstowe polskie i niemieckie, będące felietonami o tematyce z różnych dziedzin życia, w reprezentatywnej liczbie 50 artykułów polskich i 50 artykułów niemieckich.

EN

This paper presents analysis of the results of detection of verbs deduced by a text-mining mechanism based on the model of the characteristics of words in sentences, based on a relational database structure. Attempt is made to build a mechanism to detect the words based on their statistical distribution in sentences. In article where analyzed Polish and German feuilletons of the various fields of life, in a representative number of 50 Polish articles and 50 German articles.

3

SQL-based approach to distributed and incremental association rule mining 1

Kona H., Chakravarthy S., Arora A.

Foundations of Computing and Decision Sciences

|

2006

|

Vol. 31, No. 1

5-26

EN

Database mining is the process of extracting interesting and previously unknown patterns and correlations from data stored in Data Base Management Systems (DBMSs). Association rule mining is the process of discovering items, which tend to occur together in transactions. If the data to be mined were stored as relations in multiple databases, instead of moving data from one database to another, a partitioned or distributed approach would be appropriate. Also, incremental addition of data to the dala set should not necessitate re-computation of rules for the entire data set. This paper focuses on partitioned and incremental approaches to association rule mining for data stored in Relational DBMSs. This paper proposes a partitioning approach that is very effective for distributed databases as compared to the main memory partitioned approach. Our approach uses SQL-based K-way join algorithm and its optimizations. A second alternative that trades accuracy for performance is also presented. Our results indicate that, beyond a certain size of data sets, the accuracy is preserved with this approach and results in better performance. The incremental association rule-mining algorithm reduces the task of re-computing the rules each time new data is added to the database. This paper implements the incremental algorithm using the negative border concept with a number of optimizations. Extensive experiments are performed and results are presented for both partitioned and incremental approaches using IBM DB2/UDB and Oracle 8i.