PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Compact and hash based variants of the suffix array

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Full-text indexing aims at building a data structure over a given text capable of efficiently finding arbitrary text patterns, and possibly requiring little space. We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other, called FBCSA, is a compact data structure, similar to Mäkinen’s compact suffix array (MakCSA), but working on fixed size blocks. Experiments on the widely used Pizza & Chili datasets show that SA-hash is about 2–3 times faster in pattern searches (counts) than the standard suffix array, for the price of requiring 0.2n–1.1n bytes of extra space, where n is the text length. FBCSA, in one of the presented variants, reduces the suffix array size by a factor of about 1.5–2, while it gets close in search times, winning in speed with its competitors known from the literature, MakCSA and LCSA.
Rocznik
Strony
407--418
Opis fizyczny
Bibliogr. 30 poz., tab., wykr.
Twórcy
autor
  • Institute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 Łódź, Poland
  • Institute of Applied Computer Science, Lodz University of Technology, 11 Politechniki Ave., 90-924 Łódź, Poland
Bibliografia
  • [1] S. Gog and M. Petri, “Optimized succinct data structures for massive data”, Soft. Pract. Exper. 44 (11), 1287–1314 (2014).
  • [2] J. Kärkkäinen and S.J. Puglisi, “Fixed block compression boosting in FM-indexes”, 18th Int. Symp. String Processing and Information Retrieval, SPIRE 2011, 174–184 (2011).
  • [3] G. Navarro and V. Mäkinen, “Compressed full-text indexes”, ACM Comput. Surv. 39 (1), Article 2 (2007).
  • [4] U. Manber and G. Myers, “Suffix arrays: A new method for on-line string searches”, 1st Annual ACM-SIAM Symp. Discrete Algorithms, SODA 1990, 319–327 (1990).
  • [5] V. Mäkinen, “Compact suffix array”, 11th Int. Symp. Combinatorial Pattern Matching, CPM 2000, 305–319 (2000).
  • [6] V. Mäkinen, “Compact suffix array – a space-efficient full-text index”, Fund. Inform. 56 (1–2), 191–210 (2003).
  • [7] P. Weiner, “Linear pattern matching algorithm”, 14th Annual IEEE Symp. Switching and Automata Theory, 1–11 (1973).
  • [8] M. Farach, “Optimal suffix tree construction with large alphabets”, 38th IEEE Annual Symp. Foundations of Computer Science, FOCS 1997, 137–143 (1997).
  • [9] S. Kurtz and B. Balkenhol, “Space efficient linear time computation of the Burrows and Wheeler transformation”, In: Numbers, Information and Complexity, Kluwer Academic Publishers, 2000, pp. 375–383.
  • [10] P. Ferragina and J. Fischer, “Suffix arrays on words”, 18th Int. Symp. Combinatorial Pattern Matching, CPM 2007, 328–339 (2007).
  • [11] J. Kärkkäinen and P. Sanders, “Simple linear work suffix array construction”, 30th Int. Coll. Automata, Languages and Programming, ICALP 2003, 943–955 (2003).
  • [12] G. Nong, S. Zhang, and W. H. Chan, “Two efficient algorithms for linear time suffix array construction”, IEEE Trans. Comput. 60 (10), 1471–1484 (2011).
  • [13] J. Kärkkäinen, “Suffix cactus: A cross between suffix tree and suffix array”, 6th Int. Symp. Combinatorial Pattern Matching, CPM 1995, 191–204 (1995).
  • [14] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch, “The enhanced suffix array and its applications to genome analysis”, 2nd Int. Workshop Algorithms in Bioinformatics, WABI 2002, 449–463 (2002).
  • [15] N. Grimsmo, On Performance and Cache Effects in Substring Indexes, Tech. Rep. IDI-TR-2007‒04, Department of Computer and Information Science, NTNU, Trondheim, 2007.
  • [16] R. Cole, T. Kopelowitz, and M. Lewenstein, “Suffix trays and suffix trists: Structures for faster text indexing”, 30th Int. Coll. Automata, Languages and Programming, Part I, ICAPL 2006, 358–369 (2006).
  • [17] J. Fischer and P. Gawrychowski, “Alphabet-dependent string searching with wexponential search trees”, 26th Int. Symp. Combinatorial Pattern Matching, CPM 2015, 160–171 (2015).
  • [18] R. Grossi and J. S. Vitter, “Compressed suffix arrays and suffix trees with applications to text indexing and string matching”, 32nd ACM Symp. Theory of Computing, STOC 2000, 397–406 (2000).
  • [19] K. Sadakane, “Succinct representations of lcp information and improvements in the compressed suffix arrays”, 13th Annual ACM-SIAM Symp. Discrete Algorithms, SODA 2002, 225–232 (2002).
  • [20] P. Ferragina and G. Manzini, “Opportunistic data structures with applications”, 41th IEEE Annual Symp. Foundations of Computer Science, FOCS 2000, 390–398 (2000).
  • [21] P. Ferragina, R. González, G. Navarro, and R. Venturini, “Compressed text indexes: From theory to practice”, ACM J. Exp. Algorithmics 13, 12:1.12–12:1.31 (2009).
  • [22] R. González and G. Navarro, “Compressed text indexes with fast locate”, 18th Int. Symp. Combinatorial Pattern Matching, CPM 2007, 216–227 (2007).
  • [23] R. González, G. Navarro, and H. Ferrada, “Locally compressed suffix arrays”, ACM J. Exp. Algorithmics 19 (1), Article 1 (2014).
  • [24] S. Gog and A. Moffat, “Adding compression and blended search to a compact two-level suffix array”, 20th Int. Symp. String Processing and Information Retrieval, SPIRE 2013, 141–152 (2013).
  • [25] S. Gog, A. Moffat, J. S. Culpepper, A. Turpin, and A. Wirth, “Large-scale pattern search using reduced-space on-disk suffix arrays”, IEEE Trans. Knowl. Data Eng. 26 (8), 1918–1931 (2014).
  • [26] F.C. Botelho, Near-Optimal Space Perfect Hashing Algorithms, Ph.D. Dissertation, Federal University of Minas Gerais, Department of Computer Science, Belo Horizonte, Brazil (2008).
  • [27] B. Schlegel, R. Gemulla, and W. Lehner, “K-ary search on modern processors”, 5th Int. Workshop Data Management on New Hardware, DaMoN 2009, 52–60 (2009).
  • [28] P.-V. Khuong and P. Morin, “Array layouts for comparison-based searching”, CoRR, abs/1509.05053 (2015).
  • [29] A. Moffat and S. Gog, “String search experimentation using massive data”, Philosophical Transactions of the Royal Society of London A: Mathematical, Physical, and Engineering Sciences 372 (2016), 20130135 (2014).
  • [30] R. Raman, V. Raman, and S.S. Rao, “Succinct indexable dictionaries with applications to encoding k-ary trees and multisets”, 13th Annual ACM-SIAM Symp. Discrete Algorithms, SODA 2002, 233–242 (2002).
Uwagi
PL
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-60159550-3dd2-49d4-9a42-c5404b67b5b5
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.