Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!

Znaleziono wyników: 5

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
1
EN
This paper presents the goals, results and conclusions from an experiment in which several shallow text summarization methods have been applied to news articles written in Polish. Specifically, we focused on various techniques of salient sentence selection, as these algorithms are most popular in the English-speaking world and are computationally efficient. The quality of automatically generated summaries was evaluated by comparing them against a reference set of man-made summaries. This reference set of summaries is a valuable resource on its own, as it comes from a survey where user groups from different backgrounds had been asked to select "the most appropriate" set of sentences to form a generic, informative summary.
2
Content available remote Extending k-means with the description comes first approach
EN
This paper describes a technique for clustering large collections of short and medium length text documents such as press articles, news stories and the like. The technique called description comes first (DCF) consists of identification of related document clusters, selection of salient phrases relevant to these clusters and reallocation of documents matching the selected phrases to form final document groups. The advantages of this technique include more comprehensive cluster labels and clearer (more transparent) relationship between cluster labels and their content. We demonstrate the DCF by taking a standard k-means algorithm as a baseline and weaving DCF elements into it; the outcome is the descriptive k-means (DKM) algorithm. The paper goes through technical background explaining how to implement DKM efficiently and ends with the description of an experiment measuring clustering quality on a benchmark document collection 20-newsgroups. Short fragments of this paper appeared at the poster session of the RIAO 2007 conference, Pittsburgh, PA, USA (electronic proceedings only).
4
Content available remote Traceability: taming uncontrolled change in software development
EN
Current trends in software engineering focus on processes and methodology of human-controlled change management. In the paper we show that there is a strong need for automated, integrated software development environments, where change propagation would be facilitated not only by humans (even if process-driven), but also by software modules specifically dedicated to this task and spanning over all project artefacts. We prove that such integrated approach has a great impact on software development and we describe a project, which attempts to fulfil the assumptions presented in this paper.
5
Content available remote Juicer - a data mining approach to information extraction from the WWW
EN
We present a novel approach to automatic text mining on the World Wide Web. Considering the fact that the enormously dynamic growth of the WWW results in a need for new, more powerful information extraction tools we designed and implemented a system, which adapts techniques originally introduced in the field of data mining. We believe that similar systems, which usually base on machine learning or natural language processing methods, can prove to be ineffective when dealing with the very large numbers of hypertext documents of different structure and subject. Moreover, such systems tend to treat HTML documents as plain texts not taking into account the additional information contained in their markup tags.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.