Wyniki wyszukiwania - BazTech

1

Evaluation of Sentence-Selection Text Summarization Methods on Polish News Articles

Dudczak A., Stefanowski J., Weiss D.

Foundations of Computing and Decision Sciences

|

2010

|

Vol. 35, No. 1

27-41

EN

This paper presents the goals, results and conclusions from an experiment in which several shallow text summarization methods have been applied to news articles written in Polish. Specifically, we focused on various techniques of salient sentence selection, as these algorithms are most popular in the English-speaking world and are computationally efficient. The quality of automatically generated summaries was evaluated by comparing them against a reference set of man-made summaries. This reference set of summaries is a valuable resource on its own, as it comes from a survey where user groups from different backgrounds had been asked to select "the most appropriate" set of sentences to form a generic, informative summary.

2

Extending k-means with the description comes first approach

Stefanowski J., Weiss D.

Control and Cybernetics

|

2007

|

Vol. 36, no 4

1009-1035

EN

This paper describes a technique for clustering large collections of short and medium length text documents such as press articles, news stories and the like. The technique called description comes first (DCF) consists of identification of related document clusters, selection of salient phrases relevant to these clusters and reallocation of documents matching the selected phrases to form final document groups. The advantages of this technique include more comprehensive cluster labels and clearer (more transparent) relationship between cluster labels and their content. We demonstrate the DCF by taking a standard k-means algorithm as a baseline and weaving DCF elements into it; the outcome is the descriptive k-means (DKM) algorithm. The paper goes through technical background explaining how to implement DKM efficiently and ends with the description of an experiment measuring clustering quality on a benchmark document collection 20-newsgroups. Short fragments of this paper appeared at the poster session of the RIAO 2007 conference, Pittsburgh, PA, USA (electronic proceedings only).

3

Methods for assessment of the occupational exposure at working places of different TENORM industrial branches

Weiss D., Biesold H., Jovanovic P., Juhasz L., Juhasz L., Laciok A., Leopold K., Leopold K., Michalik B., Moravanska H., Poffijn A., Popescu M., Radulescu C., Szerbin P., Wiegand J.

Prace Naukowe GIG. Górnictwo i Środowisko / Główny Instytut Górnictwa

|

2004

|

nr 1

60-61

4

Traceability: taming uncontrolled change in software development

Kowalczykiewicz K., Weiss D.

Foundations of Computing and Decision Sciences

|

2002

|

Vol. 27, No. 4

239-248

EN

Current trends in software engineering focus on processes and methodology of human-controlled change management. In the paper we show that there is a strong need for automated, integrated software development environments, where change propagation would be facilitated not only by humans (even if process-driven), but also by software modules specifically dedicated to this task and spanning over all project artefacts. We prove that such integrated approach has a great impact on software development and we describe a project, which attempts to fulfil the assumptions presented in this paper.

5

Juicer - a data mining approach to information extraction from the WWW

Masłowska I., Weiss D.

Foundations of Computing and Decision Sciences

|

2000

|

Vol. 25, No. 2

67-87

EN

We present a novel approach to automatic text mining on the World Wide Web. Considering the fact that the enormously dynamic growth of the WWW results in a need for new, more powerful information extraction tools we designed and implemented a system, which adapts techniques originally introduced in the field of data mining. We believe that similar systems, which usually base on machine learning or natural language processing methods, can prove to be ineffective when dealing with the very large numbers of hypertext documents of different structure and subject. Moreover, such systems tend to treat HTML documents as plain texts not taking into account the additional information contained in their markup tags.