Latent semantic indexing for patent documents

Moldovan, A.; Boţ, R. I.; Wanka, G.

Artykuł - szczegóły

Tytuł artykułu

Latent semantic indexing for patent documents

Autorzy

Moldovan A. , Boţ R. I. , Wanka G.

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/amc/amc15/amc15412.pdf [zdalny]

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We present some experiments that provide the optimal number of dimensions for the Latent Semantic Space and we compare the performance of Latent Semantic Indexing (LSI) to the Vector Space Model (VSM) technique applied to real life text documents, namely, patent documents. However, we do not strongly recommend the LSI as an improved alternative method to the VSM, since the results are not significantly better.

Słowa kluczowe

Latent Semantic Indexing (LSI) singular value decomposition (SVD) vector space model (VSM) patent classification

indeksowanie semantyczne rozkład wartości szczególnych model przestrzeni wektorowej klasyfikacja patentowa

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2005

Tom

Vol. 15, no 4

Strony

551--560

Opis fizyczny

Bibliogr. 25 poz., tab., wykr.

Twórcy

autor

Moldovan A.

amol@mathematik.tu-chemnitz.de

Faculty of Mathematics, Chemnitz University of Technology, D–09107 Chemnitz, Germany

autor

Boţ R. I.

radu.bot@mathematik.tu-chemnitz.de

Faculty of Mathematics, Chemnitz University of Technology, D–09107 Chemnitz, Germany

autor

Wanka G.

gert.wanka@mathematik.tu-chemnitz.de

Faculty of Mathematics, Chemnitz University of Technology, D–09107 Chemnitz, Germany

Bibliografia

[1] ARPACK (2005): Arnoldi package. — Available at: http://www.ime.unicamp.br/chico/arpack++.
[2] Bartell B.T., Cotrell G.W. and Belew R.K. (1992): Latent semantic indexing is an optimal special case of multidimensional scaling. — Proc. ACM/SIGIR’92 Conf., Copenhagen, Denmark, pp.161–167.
[3] Berry M.W., Dumais S.T. and O’Brien G.W. (1995): Using linear algebra for intelligent information retrieval. — SIAM Rev., Vol. 37, No. 4, pp. 573–595.
[4] Deerwester S., Dumais S.T., Furnas G.W., Landauer T.K. and Harshman R. (1990): Indexing by latent semantic analysis. —J. Amer. Soc. Inf. Sci., Vol. 41, No. 6, pp. 391–407.
[5] Ding C.H.Q. (1999): A similarity-based probability model for latent semantic indexing. — Proc. 22nd ACM/SIGIR Conf., Berkley, CA, pp. 58–65.
[6] Dumais S.T. (1991): Improving the retrieval of information from external sources. — Behav. Res. Meth. Instrum. Comput., Vol. 23, No. 2, pp. 229–236.
[7] Dumais S.T. (1995): Using LSI for information filtering: TREC-3 experiments. — Proc. 3rd Text REtrieval Conf., TREC3, Gaithersburg, MD, pp. 219–230.
[8] Fuhr N. (1989): Models for retrieval with probabilistic indexing. — Inf. Process. Manag., Vol. 25, No. 1, pp. 55–72.
[9] Fuhr N. (1992): Probabilistic models in information retrieval. — Comput. J., Vol. 35, No. 3, pp. 243–255.
[10] Hull D. (1994): Improving text retrieval for the routing problem using latent semantic indexing. —Proc. 17th ACM/SIGIR Conf., Dublin, Ireland, pp. 282–290.
[11] Hull D. (1996): Stemming algorithms: A case study for detailed evaluation. — J. Amer. Soc. Inform. Sci., Vol. 47, No. 1, pp. 70–84.
[12] Jessup E.R. and Martin J.H. (2001): Taking a new look at the latent semantic analysis approach to information retrieval. — Proc. SIAM Workshop Computational Information Retrieval, Raleigh, NC, pp. 121–144.
[13] Kolda T.G. and O’Leary D.P. (1998): A semidiscrete matrix decomposition for latent semantic indexing information retrieval. — ACM Trans. Inf. Syst. (TOIS), Vol. 16, No. 4, pp. 322–346.
[14] Landauer T.K., Foltz P. and Laham D. (1998): Introduction to latent semantic analysis. — Discourse Processes, Vol. 25, pp. 259–284.
[15] MED (2005): Medlon collection. —Available at: ftp://ftp.cs.cornell.edu/pub/smart/med.
[16] Papadimitriou C.H., Raghavan P., Tamaki H. and Vempala S. (1998): Latent semantic indexing: A probabilistic analysis. — Proc. Symp. Principles of Database Systems, PODS, Seattle, Washington, pp. 150–168.
[17] PorterStemmer (2005): The Porter stemming algorithm. — Available at: http://www.tartarus.org/martin/PorterStemmer.
[18] Salton G. (1971): The SMART Retrieval System: Experiments in Automatic Document Processing. — Englewood Cliffs, NJ: Prentice Hall.
[19] Schütze H. (1992): Dimensions of meaning. — Proc. Conf. Supercomputing ’92, Minneapolis, MN, pp. 787–796.
[20] Schütze, H. (1998): Automatic word sense discrimination. — Comput. Linguist., Vol. 24, No. 1, pp. 97–124.
[21] SMART (2005): SMART’s English stoplist. — Available at: ftp://ftp.cs.cornell.edu/pub/smart/english.stop.
[22] TIME (2005): Time magazine collection. —Available at: ftp://ftp.cs.cornell.edu/pub/smart/time.
[23] Story R.E. (1996): An explanation of the effectiveness of Latent Semantic Indexing by means of a Bayesian regression model. — Inf. Process. Manag., Vol. 32, No. 3, pp. 329–344.
[24] UPSTO (2005): United States Patent and Trademark Office. — Available at: http://www.uspto.gov.
[25] Zha H., Marques O. and Simon H. (1998): A subspace-based model for information retrieval with applications in latent semantic indexing. — Proc. Conf. Irregular ’98, Barkeley, CA, pp. 29–42.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPZ2-0018-0050