A new text categorization problem is introduced. As in the classical problem, there is a set of documents and a set of categories. However, in addition to being assigned to a specific category, each document belongs to a certain sequence of documents, referred to as a case. It is assumed that all documents in the same case belong to the same category. An example may be a set of news articles. Their categories may be sport, politics, entertainment, etc. In each category there exist cases, i.e., sequences of documents describing, for example evolution of some events. The problem considered is how to classify a document to a proper category and a proper case within this category. In the paper we formalize the problem and discuss two approaches to its solution.
Bipolar linguistic summaries of data are assumed to be an extension of the ‘classical’ linguistic summarization, a data mining technique revealing complex patterns present in data in a human consistent form. The extension proposal is based on the possibilistic interpretation of the ‘and possibly’ operator and introduced notion of context, which results in the introduction of the new ‘contextual and possibly’ operator. As the end user is expecting the most relevant summaries, ways of determining the quality of summary propositions (quality measures) needs to be developed. Here we focus on specific insights into the quality measures of proposed bipolar linguistic summaries of data and present some basic examples of their correctness and necessity of introduction.
5
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW