Traditional information retrieval techniques become inadequate for the increasingly vast amounts of text data. Here we show a method of query processing, which retrieve the documents containing not only the query terms but also documents having their synonyms. The method performs the query processing by retrieving and scanning the inverted index document list. We show that query response time for conjunctive Boolean queries can be dramatically reduced, at cost in terms of secondary storage, by applying range partition feature of Oracle to reduce the primary memory storage space requirement for looking the inverted list. The proposed method is based on fuzzy relations and fuzzy reasoning to retrieve only top ranking documents from the database and grouping of the retrieved documents through Suffix tree clustering.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Suffix array is a widely used full-text index that allows fast searches on the text. It is constructed by sorting all suffixes of the text in the lexicographic order and storing pointers to the suffixes in this order. Binary search is used for fast searches on the suffix array. Compact suffix array is a compressed form of the suffix array that still allows binary searches, but the search times are also dependent on the compression. In this paper, we give efficient methods for constructing and querying compact suffix arrays. We also study practical issues, such as the trade off between compression and search times, and show how to reduce the space requirement of the construction. Experimental results are provided in comparison with other search methods. With a large text corpora, the index took 1.6 times the size of the text, while the searches were only two times slower than from a suffix array.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.