PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Indexing Schemes for Similarity Search: an Illustrated Paradigm

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We suggest a variation of the Hellerstein-Koutsoupias-Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facilitates their transfer to more general similarity measures such as quasi-metrics. We discuss links between performance of indexing schemes and high-dimensional geometry. The concepts and results are illustrated on a very large concrete dataset of peptide fragments equipped with a biologically significant similarity measure.
Wydawca
Rocznik
Strony
367--385
Opis fizyczny
wykr., bibliogr. 24 poz.
Twórcy
autor
  • Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Ave., Ottawa, Ontario, Canada KIN 6N5, Vpest283@uottawa.ca
Bibliografia
  • [1] N. Alon, The linear arboricity of graphs, Israel J. Math. 62 (1988), no. 3, 311-325.
  • [2] A. Bairoch and R. Apweiler. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28:45-48, 2000.
  • [3] Z. Bi, C. Faloutsos, and F. Korn. The "DGX" distribution for mining massive, skewed data. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 17-26, San Francisco, California, 2001. ACM Press New York, NY, USA.
  • [4] S. Brin, Near neighbor search in large metric spaces, in: Proc. of the 21st VLDB Conf., Zurich, Switzerland, Sept. 1995, pp. 574-584.
  • [5] A. Carbone and M. Gromov. Mathematical slices of molecular biology. Numéro spécial de la Gazette des Mathematiciens, Société Mathématique de France, 88:11-80, 2001.
  • [6] E. Chávez, G. Navarro, R. A. Baeza-Yates, and J. L. Marroqu´ın. Searching in metric spaces. ACM Computing Surveys, 33(3):273-321, 2001.
  • [7] P. Ciaccia, M. Patella, and P. Zezula, A cost model for similarity queries in metric spaces, in: Proc. 17-th Annual ACM Symposium on Principles of Database Systems (PODS'98), Seattle, WA, June 1998, pp. 59-68.
  • [8] N. Goodman. Ome sweet ome. Genome Technology, pages 56-59, April 2002.
  • [9] M. Gromov. Metric Structures for Riemannian and Non-Riemannian Spaces, volume 152 of Progress in Mathematics. Birkhauser, 1999.
  • [10] J. M. Hellerstein, E. Koutsoupias, D. P. Miranker, C. H. Papadimitriou, and V. Samoladas. On a model of indexability and its bounds for range queries. Journal of the ACM (JACM), 49(1):35-55, 2002.
  • [11] J. M. Hellerstein, E. Koutsoupias, and C. H. Papadimitriou. On the analysis of indexing schemes. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 249-256, Tucson, Arizona, 12-15 May 1997.
  • [12] S. Henikoff and J. Henikoff. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A., 89:10915-10919, 1992.
  • [13] P. Indyk and R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in: Proc. 30-th Symp. on Theory of Computing, 1998, pp. 604-613.
  • [14] M. Ledoux. The Concentration of Measure Phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society, 2001.
  • [15] J. Matoušek, Lectures on Discrete Geometry, volulme 212 of Graduate Texts in Mathematics, Springer-Verlag, NY a.o., 2002.
  • [16] V.D. Milman, Topics in asymptotic geometric analysis, GAFA 2000 (Tel Aviv, 1999), Geom. Funct. Anal.Special Volume, Part I (2000), 792-815.
  • [17] C.H. Papadimitriou, Database metatheory: asking the big queries, in: Proc. 14-th PODS, San Jose, CA, May 1995, pp. 1-10.
  • [18] V. Pestov. A geometric framework for modelling similarity search. In Proceedings of the 10th International Conference on Database and Expert Systems Applications (DEXA'99), pages 150-154, Florence, Italy, Sept 1999. IEEE Computer Society, Los Alamitos, CA.
  • [19] V. Pestov. On the geometry of similarity search: dimensionality curse and concentration of measure. Information Processing Letters, 73:47-51, 2000.
  • [20] C. Traina, Jr., A.J.M. Traina, and C. Faloutsos, Distance exponent: A new concept for selectivity estimation in metric trees, Technical Report CMU-CS-99-110, Computer Science Department, Carnegie Mellon University, 1999.
  • [21] J.K. Uhlmann, Satisfying general proximity/similarity queries with metric trees, Information Processing Letters 40 (1991), 175-179.
  • [22] R. Weber, H.-J. Schek, and S. Blott, A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces, in: Proc. of the 24st VLDB Conf., New York, USA, Aug. 1998, pp.194-205.
  • [23] J. Wootton and S. Federhen. Analysis of compositionally biased regions in sequence databases. Methods Enzymol., 266:554-571, 1996.
  • [24] P. Yianilos, Data structures and algorithms for nearest neighbor search in general metric spaces, in: Proc. 3rd Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 311-321, 1993.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS2-0009-0064
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.