This paper studies the possibility of processing text documents using topological information on keywords, by which we mean internal positions of the keywords in the text. While the word counts are pieces of information that is independent of the sequence of words in the text, the topological, i.e. position-related, information manifests obvious dependency on the sequence of words. In result, the presented method stops treating the texts as amorphous collections of words and starts treating them as linearly-ordered sequences of words. Thus, the introduced, topological approach is of higher level than the popular bag-of-words approaches, and its advantage should unveil in applications to texts of similar themes; due to their similar counts of keywords the topological information may prove to be indispensable. It should also require significantly smaller sets of keywords as compared to the bag-of-words approaches.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.