New algorithms for exact and approximate text matching

Grabowski, Sz.

Artykuł - szczegóły

Tytuł artykułu

New algorithms for exact and approximate text matching

Autorzy

Grabowski Sz.

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The presented dissertation focuses on various exact and approximate matching problems for textual data. Text should be understood rather broadly, including natural language, molecular biology and music information retrieval data. The work consists of five chapters, each dedicated to a separate problem. In the order of presentation, they deal with exact string matching, approximate string matching, matching with gaps (which could be considered a subclass of approximate string matching problems but the amount of contained material should justify a separate chapter), online compressed search and compressed full-text indexes. The author contributed to each of those research areas. Each chapter, however, starts with background presentation to place the author's achievements in proper context. Many of the algorithms proposed in the dissertation are based on bit-parallelism, i.e., a modern technique of making use of individual bits in a CPU word. In particular, two new bit-parallel techniques have been presented, one for efficient matching in the average case, the other to reduce time complexities in the worst case of bit-parallel algorithms making use on counters. Those are rather general techniques and they have been successfully applied for multiple known string matching problems. It has been shown that the problems of matching with gaps can be attacked from very different angles, and the arsenal of existing techniques in this area has been significantly expanded. The new results comprise the algorithmic techniques of bit-parallelism, sparse dynamic programming, compact bit-parallel NFA simulations and filtering. New algorithms are also presented for full-text searching in compressed data; they are both simple and efficient. Apart from theoretical analyses, most of the proposed algorithms have been experimentally validated on modern hardware and the achieved results usually place them among very competitive ones.

W rozprawie skupiono się na wybranych problemach wyszukiwania dokładnego i przybliżonego w tekście. Pojęcie tekstu winno być rozumiane szeroko, obejmując dane w językach naturalnych, sekwencje bioinformatyczne oraz zapisy utworów muzycznych (nutowe). Praca składa się z pięciu rozdziałów, z których każdy poświęcony jest osobnemu zagadnieniu. W kolejności, są to: wyszukiwanie dokładne, wyszukiwanie przybliżone, wyszukiwanie sekwencji z przerwami (ang. gaps), wyszukiwanie online w tekście skompresowanym oraz pełnotekstowe indeksy skompresowane. Rozprawa wnosi wkład w rozwój każdego z tych problemów. Każdy rozdział zaczyna się jednak od przedstawienia odnośnego stanu wiedzy. Wiele z zaproponowanych w pracy algorytmów wykorzystuje równoległość bitową, nowoczesną technikę obliczeń z wykorzystaniem poszczególnych bitów rejestru procesora. W szczególności, zaprezentowano dwie nowe techniki równoległości bitowej, jedną mającą na celu optymalizację przypadku średniego w wyszukiwaniu dokładnym, drugą redukującą złożoność w przypadku najgorszym w algorytmach wykorzystujących liczniki. Te dość ogólne techniki algorytmiczne zostały pomyślnie zastosowane w szeregu konkretnych znanych algorytmów wyszukiwania. W pracy pokazano, iż do wyszukiwania sekwencji z przerwami można stosować rozliczne podejścia, rozszerzając znacznie istniejący arsenał metod dla problemów tej kategorii. Nowe wyniki bazują m. in. na technikach algorytmicznych równoległości bitowej, programowania dynamicznego rzadkiego i oszczędnych bitowo-równoległych symulacjach automatów NFA. Przedstawiono również algorytmy wyszukiwania w danych skompresowanych, cechujące się zarówno wydajnością, jak i prostotą. Obok analiz teoretycznych, większość algorytmów zaimplementowano i poddano testom empirycznym, a osiągnięte wyniki zwykle pozwalają zaliczyć nowe metody do najefektywniejszych dla danych problemów.

Słowa kluczowe

matching with gaps approximate string matching

Wydawca

Wydawnictwo Politechniki Łódzkiej

Czasopismo

Zeszyty Naukowe. Rozprawy Naukowe / Politechnika Łódzka

Rocznik

2009

Tom

Z. 384

Strony

3--243

Opis fizyczny

Bibliogr. 343 poz.

Twórcy

autor

Grabowski Sz.

Katedra Informatyki Stosowanej Politechnika Łódzka, sgrabow@kis.p.lodz.pl

Bibliografia

[AB92] A. Amir and G. Benson. Two-dimensional periodicity and its applications. In Proc. 3rd ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 440-452. SIAM, 1992.
[Abe07] J. Abel. Incremental frequency count - a post BWT-stage for the Burrows-Wheeler compression algorithm. Software-Practice and Experience, 37(3):247-265, 2007.
[ABF94] A. Amir, G. Benson, and M. Farach. An alphabet independent approach to two-dimensional pattern matching. SIAM Journal on Computing, 23(2):313-323, 1994.
[AC75] A. V. Aho and M. J. Corasick. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 18(6):333-340, 1975.
[AC91] A. Apostolico and M. Crochemore. Optimal canonization of all substrings of a string. Information and Computation, 95(l):76-95, 1991.
[ACH+01] A. Amir, R. Cole, R. Hariharan, M. Lewenstein, and E. Porat. Overlap matching. In Proc. 12th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 279-288. SIAM, 2001.
[ACR99] C. Allauzen, M. Crochemore, and M. Raffinot. Factor oracle: a new structure for pattern matching. In Proc. 26th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), LNCS 1725, pages 291-306. Springer, 1999.
[AD86] L. Allison and T. I. Dix. A bit string longest common subsequence algorithm. Information Processing Letters, 23(6):305-310, 1986.
[ADKF75] V. L. Arlazarov, E. A. Dinic, M. A. Kronrod, and I. A. Faradzev. On economic construction of the transitive closure of a direct graph. Soviet Mathematics, Doklady, 11:1209-1210, 1975. Original in Russian in Doklady Akademii Nauk SSSR, vol. 194, 1970.
[Ad1FN05] J. Adiego, P. de la Fuente, and G. Navarro. Combining structural and textual contexts for compressing semistructured databases. In Proc. Int. Mexican Conference in Computer Science (ENC'05), pages 68-73, Puebla, Mexico, 2005. IEEE CS Press.
[AE05] A. N. Arslan and Ö. Egecioglu. Algorithms for the constrained longest common subsequence problems. International Journal of Foudations of Computer Science, 16(6):1099-1109, 2005.
[AG86] A. Apostolico and R. Giancarlo. The Boyer-Moore-Galil string searching strategies revisited. SIAM Journal on Computing, 15(1):98-105, 1986.
[AG87] A. Apostolico and C. Guerra. The longest common subsequence problem revisited. Algorithmica, 2(l-4):315-336, 1987.
[AHHP94] A. Anderson, T. Hagerup, J. Hastad, and O. Petersson. The complexity of searching a sorted array of strings. In Proc. 26th ACM Symposium on the Theory of Computing (STOC), pages 317-325. ACM Press, 1994.
[AHHP01] A. Andersson, T. Hagerup, J. Hastad, and O. Petersson. Tight bounds for searching a sorted array of strings. SIAM Journal on Computing, 30(5):1552-1578, 2001.
[AK004] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms, 2(1) :53-86, 2004.
[ALM02] O. Arbell, G. M. Landau, and J. S. B. Mitchell. Edit distance of run-length encoded strings. Information Processing Letters, 83(6):307-314, 2002.
[ALP00] A. Amir, M. Lewenstein, and E. Porat. Faster algorithms for string matching with k mismatches. In Proc. 11th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 794-803. SIAM, 2000.
[ALPU05] A. Amir, O. Lipsky, E. Porat, and J. Umanski. Approximate matching in the l1 metric. In Proc. 16th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 3537, pages 91-103. Springer, 2005.
[ALS96] A. Andersson, N. J. Larsson, and K. Swanson. Suffix trees on words. In Proc. 7th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1075, pages 102-115. Springer, 1996.
[ALS99] A. Apostolico, G. M. Landau, and S. Skiena. Matching for run-length encoded strings. Journal of Complexity, 15:4-16, 1999.
[AN07] D. Arroyuelo and G. Navarro. A Lempel-Ziv text index on secondary storage. In Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 4580, pages 83-94, 2007.
[ANdlF07] J. Adiego, G. Navarro, and P. de la Fuente. Lempel-Ziv compression of highly structured documents. Journal of the American Society for Information Science and Technology (JASIST), 58(4):461-478, 2007.
[ANZ97] M. D. Araújo, G. Navarro, and N. Ziviani. Large text searching allowing errors. In Proc. 4th South American Workshop on String Processing, pages 2-20. Carleton University Press, 1997.
[Apo97] A. Apostolico. String editing and longest common subsequences. In Handbook of Formal Languages, volume 2 Linear Modeling: Background and Application, chapter 8, pages 361-398. Springer, 1997.
[AR99] C. Allauzen and M. Raffinot. Factor oracle of a set of words. Technical Report 99-11, Institut Gaspard-Monge, Université de Marne-la-Vallée, France, 1999.
[AT05] J. Abel and W. J. Teahan. Universal text preprocessing for data compression. IEEE Transactions on Computers, 54(5):497-507, 2005.
[BAHM07] J. Barbay, L. Castelli Aleardi, M. He, and J. I. Munro. Succinct representation of labeled graphs. Technical Report CS-2007-11, University of Waterloo, Ontario, Canada, 2007.
[BCN09] N. Brisaboa, A. Cerdeira, and G. Navarro. A compressed self-indexed representation of XML documents. In Proc. 13th European Conference on Research and Advanced Technology for Digital Libraries (ECDL), LNCS 5714, pages 273-284, 2009.
[BFG07] Ph. Bille, R. Fagerberg, and I. L. Gortz. Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts. In Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 4580, pages 52-62. Springer, 2007.
[BFNE03] N. Brisaboa, A. Farina, G. Navarro, and M. Esteller. (s,c)-dense coding: An optimized compression code for natural language text databases. In Proc. 10th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 2857, pages 122-136. Springer, 2003.
[BFNP04] N. Brisaboa, A. Farina, G. Navarro, and J. L. Paramá. Simple, fast, and efficient natural language adaptive compression. In Proc. 11th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 3246, pages 230-241. Springer, 2004.
[BFNP05] N. Brisaboa, A. Farina, G. Navarro, and J. L. Paramá. Efficiently decodable and searchable natural language adaptive compression. In Proc. 28th Int. Conference on Research and Development in Information Retrieval (SIGIR), pages 234-241. ACM Press, 2005.
[BFNP06] N. Brisaboa, A. Farina, G. Navarro, and J. L. Paramá. Improving semistatic compression via pair-based coding. In Proc. 6th Int. Conference on Perspectives of System Informatics (PSI'06), LNCS 4378, pages 124-134, 2006.
[BFNP07] N. Brisaboa, A. Farina, G. Navarro, and J. L. Paramá. Lightweight natural language text compression. Information Retrieval, 10:1-33, 2007.
[BGS72] M. Beeler, R. W. Gosper, and R. Schroeppel. HAKMEM. MIT AI Memo 239, 1972. http://www.inwap.com/pdp10/hbaker/hakmem/ algorithms.html#iteml79 (link verified Oct. 23, 2009).
[BHR00] L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In Proc. 7th Int. Symposium on String Processing and Information Retrieval (SPIRE), pages 39-48. IEEE Computer Society, 2000.
[Bil09] Ph. Bille. Fast searching in packed strings. In Proc. 20th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 5577, pages 116-126, 2009.
[Bin00] E. Binder. Usenet group: comp.compression, 2000.
[BINP03] N. Brisaboa, E. Iglesias, G. Navarro, and J. L. Paramá. An efficient compression code for text databases. In Proc. 25th European Conference on Information Retrieval Research (ECIR'03), LNCS 2633, pages 468-481. Springer, 2003.
[Bis08] M. T. Biskup. Guaranteed synchronization of Huffman codes. In Proc. Data Compression Conference (DCC), pages 462-471, Snowbird, UT, 2008. IEEE Computer Society Press.
[BK00] B. Balkenhol and S. Kurtz. Universal data compression based on the Burrows-Wheeler transformation: Theory and practice. IEEE Transactions on Computers, 49(10):1043-1053, 2000.
[BKS99] B. Balkenhol, S. Kurtz, and Y. M. Shtarkov. Modifications of the Burrows and Wheeler data compression algorithm. In Proc. Data Compression Conference (DCC), pages 188-197, 1999.
[BM77] R. S. Boyer and J. S. Moore. A fast string searching algorithm. Communications of the ACM, 20(10):762-772, 1977.
[BMM97] A. Brodnik, P. B. Miltersen, and J. I. Munro. Trans-dichotomous algorithms without multiplication - some upper and lower bounds. In Proc. 5th Workshop on Algorithms and Data Structures (WADS), LNCS 1272, pages 426-439. Springer, 1997.
[BR99] T. Berry and S. Ravindran. A fast string matching algorithm and experimental results. In Proc. Prague Stringology Club Workshop '99, pages 16-28, Czech Technical University, Prague, Czech Republic, 1999. Collaborative Report DC-99-05.
[Bre93] D. Breslauer. Saving comparisons in the Crochemore-Perrin string matching algorithm. In Proc. 1st Annual European Symposium on Algorithms (ESA), LNCS 726, pages 61-72. Springer, 1993.
[BSTW86] J. L. Bentley, D. D. Sleator, R. E. Tarjan, and V. K. Wei. A locally adaptive data compression scheme. Communications of the ACM, 29(4):320-330, 1986.
[BW94] M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, CA, 1994.
[BY89a] R. A. Baeza-Yates. Efficient text searching. Ph. D. Thesis, Department of Computer Science, University of Waterloo, Ontario, Canada, 1989.
[BY89b] R. A. Baeza-Yates. Improved string searching. Software-Practice and Experience, 19(3):257-271, 1989.
[BYBZ96] R. A. Baeza-Yates, E. F. Barbosa, and N. Ziviani. Hierarchies of indices for text searching. Information Systems, 21(6):497-514, 1996.
[BYG89] R. A. Baeza-Yates and G. H. Gonnet. A new approach to text searching. In Proc. 12th Int. Conference on Research and Development in Information Retrieval (SIGIR), pages 168-175. ACM Press, 1989.
[BYG92] R. A. Baeza-Yates and G. H. Gonnet. A new approach to text searching. Communications of the ACM, 35(10):74-82, 1992.
[BYN96] R. A. Baeza-Yates and G. Navarro. A faster algorithm for approximate string matching. In Proc. 7th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1075, pages 1-23, Laguna Beach, CA, 1996. Springer.
[BYN97] R. A. Baeza-Yates and G. Navarro. Multiple approximate string matching. In Proc. 5th Workshop on Algorithms and Data Structures (WADS), LNCS 1272, pages 174-184. Springer, 1997.
[BYN99] R. A. Baeza-Yates and G. Navarro. Faster approximate string matching. Algorithmica, 23(2):127-158, 1999.
[BYN00] R. A. Baeza-Yates and G. Navarro. Block-addressing indices for approximate text retrieval. J. of the American Society for Information Science (JASIS), 51(l):69-82, January 2000.
[BYN04] R. A. Baeza-Yates and G. Navarro. Text Searching: Theory and Practice, pages 565-597. Studies in Fuzzyness and Soft Computing 148. Springer, 2004. ISBN 3-540-20907-7.
[CCF05a] D. Cantone, S. Cristofaro, and S. Faro. An efficient algorithm for ?-approximate matching with a-bounded gaps in musical sequences. In Proc. 4th Workshop on Efficient and Experimental Algorithms (WEA), LNCS 3503, pages 428-439. Springer, 2005.
[CCF05b] D. Cantone, S. Cristofaro, and S. Faro. On tuning the (?,?)-sequential-sampling algorithm for ? -approximate matching with ?-bounded gaps in musical sequences. In Proc. 6th Int. Conference on Music Information Retrieval (ISMIR), 2005.
[CCF08] D. Cantone, S. Cristofaro, and S. Faro. New efficient bit-parallel algorithms for the ? -matching problem with ?-bounded gaps in musical sequences. In Proc. 12th Prague Stringology Conference (PSC), pages 170-184, 2008.
[CCG+94] M. Crochemore, A. Czumaj, L. Gąsieniec, S. Jarominek, T. Lecroq, W. Plandowski, and W. Rytter. Speeding up two string matching algorithms. Algorithmica, 12(4/5) :247-267, 1994.
[CCI+99] E. Cambouropoulos, M. Crochemore, C. S. Iliopoulos, L. Mouchard, and Y. J. Pinzon. Algorithms for computing approximate repetitions in musical sequences. In Proc. 10th Australasian Workshop on Combinatorial Algorithms (AWOCA), pages 129-144, 1999.
[CCI05] P. Clifford, R. Clifford, and C. S. Iliopoulos. Faster algorithms for ?,?-matching and related problems. In Proc. 16th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 3537, pages 68-78. Springer, 2005.
[CGL04] R. Cole, L.-A. Gottlieb, and M. Lewenstein. Dictionary matching and indexing with errors and don't cares. In Proc. 36th ACM Symposium on the Theory of Computing (STOC), pages 91-100. ACM, 2004.
[CGR99] M. Crochemore, L. Gąsieniec, and W. Rytter. Constant-space string-matching in sublinear average time. Theoretical Computer Science, 218(1) :197-203, 1999.
[CH92] R. Cole and R. Hariharan. Tighter bounds on the exact complexity of string matching. In Proc. 33rd IEEE Annual Symposium on Foundations of Computer Science (FOGS), pages 600-609. IEEE Computer Society Press, 1992.
[CH97] R. Cole and R. Hariharan. Tighter upper bounds on the exact complexity of string matching. SI AM Journal on Computing, 26(3) :803-856, 1997.
[CH98] R. Cole and R. Hariharan. Approximate string matching: A simpler faster algorithm. In Proc. 9th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 463-472. SIAM, 1998.
[CHL04] H.-L. Chan, W.-K. Hon, and T. W. Lam. Compressed index for a dynamic collection of texts. In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 3109, pages 445-456, 2004.
[CHSV06] Y.-F. CHien, W.-K. Hon, R. Shah, and J. S. Vitter. Compressed text indexing and range searching. Technical Report TR-06-021, Purdue University, Department of Computer Science, 2006.
[CI04] R. Clifford and C. S. Iliopoulos. Approximate string matching for music analysis. Soft Computing, 8:597-603, 2004.
[CIM+02] M. Crochemore, C. S. Iliopoulos, C. Makris, W. Rytter, A. Tsakalidis, and K. Tsichlas. Approximate string matching with gaps. Nordic Journal of Computing, 9(l):54-65, 2002.
[CIN+05]M. Crochemore, C. S. Iliopoulos, G. Navarro, Y. Pinzon, and A. Salinger. Bit-parallel (?,?)-matching suffix automata. Journal of Discrete Algorithms, 3(2-4):198-214, 2005.
[CIPR00] M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid. A fast and practical bit-vector algorithm for the longest common subsequence problem. In Proc. 11th Australasian Workshop on Combinatorial Algorithms (AWOCA), pages 75-86, 2000.
[CL92] W. I. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. 3rd Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 664, pages 175-184. Springer, 1992.
[CL94] W. I. Chang and E. L. Lawler. Sublinear approximate string matching and biological applications. Algorithmica, 12(4/5):327-344, 1994.
[CL04] C. Charras and T. Lecroq. Handbook of Exact String Matching Algorithms. King's College Publications, 2004.
[Cla96] D. Clark. Compact Pat Trees. PhD thesis, University of Waterloo, Ontario, Canada, 1996.
[CLP98] C. Charras, T. Lecroq, and J. D. Pehoushek. A very fast string matching algorithm for small alphabets and long patterns. In Proc. 9th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1448, pages 55-64. Springer, 1998.
[CLS+06] H.-L. Chan, T. W. Lam, W.-K. Sung, S.-L. Tarn, and S.-S. Wong. A linear size index for approximate pattern matching. In Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 4009, pages 49-59. Springer, 2006.
[CM94] W. I. Chang and T. G. Marr. Approximate string matching and local similarity. In Proc. 5th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 807, pages 259-273. Springer, 1994.
[CM05] J. S. Culpepper and A. Moffat. Enhanced byte codes with restricted prefix properties. In Proc. 12th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 3772, pages 1-12. Springer, 2005.
[CM06] J. S. Culpepper and A. Moffat. Phrase-based pattern matching in compressed text. In Proc. 13th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 4209, pages 337-345. Springer, 2006.
[CMRS98] M. Crochemore, F. Mignosi, A. Restivo, and S. Salemi. Data compression using antidictionaries. Technical Report IGM 98-10, Institut Gaspard-Monge, Université de Marne-la-Vallée, France, 1998.
[CN08] F. Claude and G. Navarro. Practical rank/select queries over arbitrary sequences. In Proc. 15th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 5280, pages 176-187. Springer, 2008.
[CO06] L. P. Coelho and A. L. Oliveira. Dotted suffix trees a structure for approximate text indexing. In Proc. 13th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 4209, pages 329-336. Springer, 2006.
[Col91] L. Colussi. Correctness and efficiency of the pattern matching algorithms. Information and Computation, 95(2):225-251, 1991.
[CP91] M. Crochemore and D. Perrin. Two-way string-matching. The Journal of the ACM, 38(3):651-675, 1991.
[CT97] J. G. Cleary and W. J. Teahan. Unbounded length contexts for PPM. The Computer Journal, 40(2/3):67-75, 1997.
[CW84] J. G. Cleary and I. H. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Computers, 32:396-402, 1984.
[Dal02] H. Dalianis. Evaluating a spelling support in a search engine. In Proc. 6th Int. Conference on Applications of Natural Language to Information Systems-Revised Papers, LNCS 2553, pages 183-190, London, UK, 2002. Springer.
[Dam64] F. Damerau. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171-176, 1964.
[DC05] S. Deorowicz and M. G. Ciura. Correcting spelling errors by modelling their causes. International Journal of Applied Mathematics and Computer Science, 15(2):275-285, 2005.
[Deo02] S. Deorowicz. Second step algorithms in the Burrows-Wheeler compression algorithm. Software-Practice and Experience, 32(2):99-lll, 2002.
[Deo03] S. Deorowicz. Universal lossless data compression algorithms. Ph. D. Thesis, Silesian University of Technology, Gliwice, Poland, 2003.
[DeoOS] S. Deorowicz. Context exhumation after the Burrows-Wheeler transform. Information Processing Letters, 95(1):313-320, 2005.
[Deo06] S. Deorowicz. Speeding up transposition-invariant string matching. Information Processing Letters, 100(1) :14-20, 2006.
[Deo07] S. Deorowicz. Fast algorithm for the constrained longest common subsequence problem. Theoretical and Applied Informatics, 19(2):91-102, 2007.
[Deo09] S. Deorowicz. An algorithm for solving the longest increasing circular subsequence problem. Information Processing Letters, 109(12) :630-634, 2009.
[Deu96] P. Deutsch. [RFC 1951] DEFLATE Compressed Data Format Specification version 1.3, 1996.
[DFG+97] G. Das, R. Fleischer, L. Gąsieniec, D. Gunopulos, and J. Kärkkäinen. Episode matching. In Proc. 8th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1264, pages 12-27. Springer, 1997.
[DG09a] S. Deorowicz and Sz. Grabowski. A hybrid algorithm for the longest common transposition-invariant subsequence problem. Computing and Informatics, 2009. Accepted.
[DG09b] S. Deorowicz and Sz. Grabowski. On two variants of the longest increasing subsequence problem. In Proc, Int. Conference on Man-Machine Interactions (ICMMI), pages 541-549, 2009.
[DHPT09] B. Durian, J. Holub, H. Peltola, and J. Tarhio. Tuning BNDM with q-grams. In Proc. Workshop on Algorithm Engineering and Experiments (ALENEX), pages 29-37. SIAM, 2009.
[DO09] S. Deorowicz and J. Obstój. Constrained longest common subsequence computing algorithms in practice. Technical report, Silesian University of Technology, Gliwice, 2009. http://sun.aei.polsl.pl/ ~sdeor/pub/tr_do2009.pdf.
[Döm64] B. Dömölki. An algorithm for syntactical analysis. Computational Linguistics, 3:29-46, 1964.
[EGGI92] D. Eppstein, Z. Galil, R. Giancarlo, and G. F. Italiano. Sparse dynamic programming I: Linear cost functions. The Journal of the ACM, 39(3):519-545, 1992.
[Eli75] P. Elias. Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory, 21 (2): 194-203, 1975.
[Far97] M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th IEEE Annual Symposium on Foundations of Computer Science (FOGS), pages 137-143, 1997.
[Fen96] P. M. Fenwick. Block sorting text compression-final report. Report 130, Department of Computer Science, The University of Auckland, New Zealand, 1996.
[Fer08] P. Ferragina. String algorithms and data structures. CoRR, abs/0801.2378, 2008.
[FF07] P. Ferragina and J. Fischer. Suffix arrays on words. In Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 4580, pages 328-339. Springer, 2007.
[FG99] P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its applications. The Journal of the ACM, 46:236-280, 1999.
[FG04] G. Franceschini and R. Grossi. No sorting? Better searching! In Proc. 45th IEEE Annual Symposium on Foundations of Computer Science (FOGS), pages 491-498, 2004.
[FG05a] K. Fredriksson and Sz. Grabowski. Efficient algorithms for (?,?)-matching. Report A-2005-2, Department of Computer Science, University of Joensuu, 2005.
[FG05b] K. Fredriksson and Sz. Grabowski. Practical and optimal string matching. In Proc. 12th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 3772, pages 374-385. Springer, 2005.
[FG06a] K. Fredriksson and Sz. Grabowski. Efficient algorithms for (?,?,?)-matching. In Proc. 11th Prague Stringology Conference (PSC), pages 29-40, 2006.
[FG06b] K. Fredriksson and Sz. Grabowski. Efficient algorithms for pattern matching with general gaps and character classes. In Proc. 13th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 4209, pages 267-278. Springer, 2006.
[FG06c] K. Fredriksson and Sz. Grabowski. Efficient bit-parallel algorithms for (?,?)-matching. In Proc. 5th Workshop on Efficient and Experimental Algorithms (WEA), LNCS 4007, pages 170-181. Springer, 2006.
[FG06d] K. Fredriksson and Sz. Grabowski. A general compression algorithm that supports fast searching. Information Processing Letters, 100(6) :226-232, 2006.
[FG08a] K. Fredriksson and Sz. Grabowski. Efficient algorithms for (?, ?, ?) and (?, k?, ?)-matching. International Journal of Foudations of Computer Science, 19(1):163-184, 2008.
[FG08b] K. Fredriksson and Sz. Grabowski. Efficient algorithms for pattern matching with general gaps, character classes and transposition in-variance. Information Retrieval, 11(4):335-357, 2008.
[FG09a] K. Fredriksson and Sz. Grabowski. Average-optimal string matching. Journal of Discrete Algorithms, 7(4):579-594, 2009.
[FG09b] K. Fredriksson and Sz. Grabowski. Fast convolutions and their applications in approximate string matching. In Pre-Proc. 20th Int. Workshop on Combinatorial Algorithms (IWOCA 2009), pages 363-372, 2009.
[FG09c] K. Fredriksson and Sz. Grabowski. Nested counters in bit-parallel string matching. In Proc. 3rd Int. Conference on Language and Automata Theory and Applications (LATA), LNCS 5457, pages 338-349. Springer, 2009.
[FGM06] P. Ferragina, R. Giancarlo, and G. Manzini. The engineering of a compression boosting library: Theory vs practice in BWT compression. In Proc. 14th Annual European Symposium on Algorithms (ESA), LNCS 4168, pages 756-767. Springer, 2006.
[FGNV09] P. Ferragina, R. Gonzalez, G. Navarro, and R. Venturini. Compressed text indexes: From theory to practice. ACM Journal of Experimental Algorithmics, 13:article 12, 2009. 30 pages.
[FL08] S. Faro and T. Lecroq. Efficient variants of the backward-oracle-matching algorithm. In Proc. 12th Prague Stringology Conference (PSC), pages 146-160, 2008.
[FL09] S. Faro and T. Lecroq. An efficient matching algorithm for encoded DNA sequences and binary strings. In Proc. 20th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 5577, pages 106-115. Springer, 2009.
[FLMM06] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Compressing and searching XML data via two zips. In Proc. 15th Int. Conference on World Wide Web, pages 751-760. ACM, 2006.
[FM00] P. Ferragina and G. Manzini. Opportunistic data structures with applications. In Proc. 41st IEEE Annual Symposium on Foundations of Computer Science (FOCS), pages 390-398, 2000.
[FM01] P. Ferragina and G. Manzini. An experimental study of an opportunistic index. In Proc. 12th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 269-278. SIAM, 2001.
[FMMN04] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. An alphabet-friendly FM-index. In Proc. 11th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 3246, pages 150-160. Springer, 2004.
[FMMN07] P. Ferragina, G. Manzini, V. Mäkinen, and G. Navarro. Compressed representations of sequences and full-text indexes. A CM Transactions on Algorithms (TALG), 3(2):article 20, 2007.
[FMN06] K. Fredriksson, V. Mäkinen, and G. Navarro. Flexible music retrieval in sublinear time. International Journal of Foudations of Computer Science, 17(6):1345-1364, 2006.
[FMN08] J. Fischer, V. Mäkinen, and G. Navarro. An(other) entropy-bounded compressed suffix tree. In Proc. 19th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 5029, pages 152-165. Springer, 2008.
[FN04] K. Fredriksson and G. Navarro. Average-optimal single and multiple approximate string matching. A CM Journal of Experimental Algorithmics, 9:article 1.4, 2004. 45 pages.
[FN07] K. Fredriksson and F. Nikitin. Simple compression code supporting random access and fast string matching. In Proc. 6th Workshop on Efficient and Experimental Algorithms (WEA), LNCS 4525, pages 203-216. Springer, 2007.
[FNP08] A. Farina, G. Navarro, and J. L. Paramá. Word-based statistical compressors as natural language compression boosters. In Proc. Data Compression Conference (DCC), pages 162-171. IEEE Computer Society Press, 2008.
[FP74] M. J. Fischer and M. Paterson. String matching and other products. In Proc. SIAM-AMS Complexity of Computation, pages 113-125, 1974.
[Fre75] M. L. Fredman. On computing the length of longest increasing subsequences. Discrete Mathematics, 11:29-35, 1975.
[Fre00] K. Fredriksson. Fast algorithms for string matching with and without swaps. http://www.cs.uku.fi/~fredriks/pub/papers/sm-w-swaps.pdf, 2000.
[Fre02] K. Fredriksson. Faster string matching with super-alphabets. In Proc. 9th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 2476, pages 44-57. Springer, 2002.
[Fre03] K. Fredriksson. Shift-Or string matching with super-alphabets. Information Processing Letters, 87(4):201-204, 2003.
[FT03] K. Fredriksson and J. Tarhio. Processing of Huffman compressed texts with a super-alphabet. In Proc. 10th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 2857, pages 108-121. Springer, 2003.
[Gag94] Ph. Gage. A new algorithm for data compression. The C Users Journal, 12(2):23-38, 1994.
[Gal78] R. G. Gallager. Variation on a theme by Huffman. IEEE Transactions on Information Theory, 24(6):668-674, 1978.
[GB06] Sz. Grabowski and W. Bieniecki. Simple techniques for plagiarism detection in student programming projects. In Proc. XIV Konferencja Sieci i Systemy Informatyczne-Teoria, Projekty, Wdrożenia, pages 225-228, Łódź, 2006.
[GBYS92] G. H. Gonnet, R. A. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In Information Retrieval Data Structures & Algorithms. Prentice-Hall, 1992.
[GD08] Sz. Grabowski and S. Deorowicz. Nice to be a chimera: A hybrid algorithm for the longest common transposition-invariant subsequence problem. In Proc. Int. Conference on Modern Problems of Radio Engineering, Telecommunications, and Computer Science (TCSET), pages 50-54, Lviv, Ukraine, 2008.
[GF08] Sz. Grabowski and K. Fredriksson. Bit-parallel string matching under Hamming distance in O(n[m/w]) worst case time. Information Processing Letters, 105(5):182-187, 2008.
[GG92] Z. Galil and R. Giancarlo. On the exact complexity of string matching: upper bounds. SIAM Journal on Computing, 21(3):407-437, 1992.
[GGMN05] R. González, Sz. Grabowski, V. Mäkinen, and G. Navarro. Practical implementation of rank and select queries. In Poster Proc. Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA), pages 27-38, Greece, 2005. CTI Press and Ellinika Grammata.
[GGV03] R. Grossi, A. Gupta, and J. S. Vitter. High-order entropy-compressed text indexes. In Proc. 14th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 841-850. SIAM, 2003.
[GGV04] R. Grossi, A. Gupta, and J. S. Vitter. When indexing equals compression: experiments with compressing suffix arrays and applications. In Proc. 15th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 636-645. SIAM, 2004.
[GKS03] R. Giegerich, S. Kurtz, and J. Stoye. Efficient implementation of lazy suffix trees. Software-Practice and Experience, 33(11):1035-1049, 2003.
[GM07] T. Gagie and G. Manzini. Move-to-front, distance coding, and inversion frequencies revisited. In Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPU), LNCS 4580, pages 71-82. Springer, 2007.
[GMN04a] Sz. Grabowski, V. Mäkinen, and G. Navarro. First Huffman, then Burrows-Wheeler: A simple alphabet-independent FM-index. Technical Report TR/DCC-2004-4, University of Chile, Department of Computer Science, 2004.
[GMN04b] Sz. Grabowski, V. Mäkinen, and G. Navarro. First Huffman, then Burrows-Wheeler: An alphabet-independent FM-index. In Proc. 11th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 3246, pages 210-211. Springer, 2004.
[GMNS05] Sz. Grabowski, V. Mäkinen, G. Navarro, and A. Salinger. A simple alphabet-independent FM-index. In Proc. 10th Prague Stringology Conference (PSC), pages 230-244, 2005.
[GMR06] A. Golynski, J. I. Munro, and S. S. Rao. Rank/select operations on large alphabets: a tool for text indexing. In Proc. 10th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 368-373. ACM Press, 2006.
[GN04] Sz. Grabowski and G. Navarro. (mn log ?) time transposition invariant LCS computation. Technical Report TR/DCC-2004-6, University of Chile, Department of Computer Science, 2004. ftp://ftp.dcc.uchile. cl/pub/users/gnavarro/transpszymon.ps.gz.
[GN07a] R. González and G. Navarro. A compressed text index on secondary memory. In Proc. 18th Int. Workshop on Combinatorial Algorithms (IWOCA), pages 80-91. College Publications, UK, 2007.
[GN07b] R. González and G. Navarro. Compressed text indexes with fast locate. In Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 4580, pages 216-227, 2007.
[GNP+06] Sz. Grabowski, G. Navarro, R. Przywarski, A. Salinger, and V. Mäkinen. A simple alphabet-independent FM-index. International Journal of Foudations of Computer Science, 17(6):1365-1384, 2006.
[Gog09] S. Gog. Broadword computing and Fibonacci code speed up compressed suffix arrays. In Proc. 8th International Symposium on Experimental Algorithms (SEA), LNCS 5526, pages 161-172. Springer, 2009.
[Gol06] A. Golynski. Optimal lower bounds for rank and select indexes. In Proc. 33rd Int. Colloquium on Automata, Languages and Programming (1CALP), LNCS 4051, pages 370-381. Springer, 2006.
[GPR95] L. Gąsieniec, W. Plandowski, and W. Rytter. Constant-space string matching with smaller number of comparisons: sequential sampling. In Proc. 6th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 937, pages 78-89. Springer, 1995.
[Gra99] Sz. Grabowski. Text preprocessing for Burrows-Wheeler block-sorting compression. In Proc. VII Konferencja Sieci i Systemy Informatyczne-Teoria, Projekty, Wdrożenia, pages 229-239, Łódź, 1999.
[Gra08] Sz. Grabowski. Making dense codes even denser. Automatyka, 12(3):769-779, 2008.
[Gre04] I. Grebnov. The grzipII program. http://magicssoft.ru/?folder=project\&page=GRZipII, 2004.
[Gri07] N. Grimsmo. On performance and cache effects in substring indexes. Technical Report IDI-TR-2007-04, NTNU, Department of Computer and Information Science, Sem Salands vei 7-9, NO-7491 Trondheim, NORWAY, 2007.
[GRRR04] R. F. Geary, N. Rahman, R. Raman, and V. Raman. A simple optimal representation for balanced parentheses. In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 3109, pages 159-172. Springer, 2004.
[GS83] Z. Galil and J. Seiferas. Time-space optimal string matching. Journal of Computer and System Sciences, 26(3):280-294, 1983.
[GS03] Sz. Grabowski and Ł. Sturgulewski. An alternative traversal of the suffix array for the worst case, 2003. Talk at Forum Informatyki Teoretycznej (FIT).
[GT86] H. Gajewska and R. E. Tarjan. Deques with heap order. Information Processing Letters, 22(4):197-200, 1986.
[Gus97] D. Gusfield. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, Cambridge, 1997.
[GV00] R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In Proc. 32nd ACM Symposium on the Theory of Computing (STOC), pages 397-406. ACM Press, 2000.
[Har71] M. C. Harrison. Implementation of the substring test by hashing. Communications of the ACM, 14(12):777-779, 1971.
[Har95] D. Harman. Overview of the second text retrieval conference (TREC-2). Information Processing and Management, 31(3):271-289, 1995.
[HD05] J. Holub and B. Durian. Fast variants of bit parallel approach to suffix automata. Talk given in The Second Haifa Annual International Stringology Research Workshop of the Israeli Science Foundation, 2005. http://www.cri.haifa.ac.il/events/2005/string/ presentations/Holub.pdf.
[Hea78] H. S. Heaps. Information Retrieval-Computational and Theoretical Aspects. Academic Press, 1978.
[HF04] L. He and B. Fang. Linear nondeterministic dawg string matching algorithm. In Proc. 11th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 3246, pages 70-71. Springer, 2004.
[HFN05] H. Hyyrö, K. Fredriksson, and G. Navarro. Increased bit-parallelism for approximate and multiple string matching. ACM Journal of Experimental Algorithmics, 10:article 2.6, 2005. 27 pages. Special issue for best papers of WE A '04.
[HHLS06] T. N. D. Huynh, W.-K. Hon, T. W. Lam, and W.-K. Sung. Approximate string matching using compressed suffix arrays. Theoretical Computer Science, 352(l-3):240-249, 2006.
[Hir75] D. S. Hirschberg. A linear space algorithm for computing maximal common subsequences. Communications of the ACM, 18(6):341-343, 1975.
[Hir78a] D. S. Hirschberg. An information-theoretic lower bound for the longest common subsequence problem. Information Processing Letters, 7(1):40-41, 1978.
[Hir78b] D. S. Hirschberg. A lower worst-case complexity for searching a dictionary. In Proc. 16th Annual Allerton Conference on Communication, Control, and Computing, pages 50-53, 1978.
[HN06] H. Hyyrö and G. Navarro. Bit-parallel computation of local similarity score matrices with unitary weights. International Journal of Foudations of Computer Science, 17(6):1325-1344, 2006.
[HorSO] R. N. Horspool. Practical fast searching in strings. Software-Practice and Experience, 10(6):501-506, 1980.
[HS77] J. W. Hunt and T. G. Szymanski. A fast algorithm for computing longest common subsequences. Communications of the ACM, 20(5):350-353, 1977.
[Huf52] D. A. Huffman. A method for the construction of minimum redundancy codes. Proceedings of the IRE, 40(9):1098-1101, 1952.
[Hyy01] H. Hyyrö. Explaining and extending the bit-parallel algorithms of Myers. Technical Report A-2001-10, University of Tampere, Finland, 2001.
[Hyy04] H. Hyyrö. Bit-parallel Ics-length computation revisited. In Proc. 15th Australasian Workshop on Combinatorial Algorithms (AWOCA), pages 16-27, University of Sydney, Australia, 2004.
[IR07] C. S. Iliopoulos and M. S. Rahman. New efficient algorithms for LCS and constrained LCS problem. In Proc. 3rd Algorithms and Complexity in Durham Workshop, Durham, UK, September 2007.
[IR08a] C. S. Iliopoulos and M. S. Rahman. Indexing circular patterns. In Proc. Workshop on Algorithms and Computation, Dhaka, Bangladesh, February 2008.
[IR08b] C. S. Iliopoulos and M. S. Rahman. A new model to solve the swap matching problem and efficient algorithms for short patterns. In Proc. 34th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), LNCS 4910, pages 316-327. Springer, 2008.
[Jac89] G. Jacobson. Succinct Static Data Structures. PhD thesis, Carnegie Mellon University, 1989.
[Joh82] D. B. Johnson. A priority queue in which initialization and queue operations take O(loglogD) time. Mathematical Systems Theory, 15:295-309, 1982.
[KA03] P. Ko and S. Aluru. Space efficient linear time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 2676, pages 200-210. Springer, 2003.
[Kau65] W. Kautz. Fibonacci codes for synchronization control. IEEE Tram-actions on Information Theory, 11:284-292, 1965.
[KB00] S. Kurtz and B. Balkenhol. Space efficient linear time computation of the Burrows and Wheeler transformation. In Numbers, Information and Complexity, pages 375-383. Kluwer Academic Publishers, 2000.
[KLV06] H. Kaplan, S. Landau, and E. Verbin. A simpler analysis of Burrows-Wheeler based compression. In Proc. 17th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 4009, pages 282-293. Springer, 2006.
[KMP77] D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6(1):323-350, 1977.
[KNM03] J. Kuri, G. Navarro, and L. Me. Fast multipattern search algorithms for intrusion detection. Fundamenta Informaticae, 56(l-2):23-49, 2003.
[Knu73] D. E. Knuth. The art of computer programming: Sorting and searching, volume 3. Addison-Wesley, Reading, MA, 1973.
[Knu85] D. E. Knuth. Dynamic Huffman coding. Journal of Algorithms, 6(2):163-180, 1985.
[KNU03] J. Kärkkäinen, G. Navarro, and E. Ukkonen. Approximate string matching on Ziv-Lempel compressed text. Journal of Discrete Algorithms, l(3/4):313-338, 2003.
[KR87] R. M. Karp and M. O Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31(2):249-260, 1987.
[KS03] J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Proc. 30th Int. Colloquium on Automata, Languages and Programming (ICALP), LNCS 2719, pages 943-955. Springer, 2003.
[KS05] S. T. Klein and D. Shapira. Pattern matching in Huffman encoded texts. Information rocessing and Management, 41(4):829-841, 2005.
[KSPP03] D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 2676, pages 186-199. Springer, 2003.
[KST94] J.-Y. Kim and J. Shawe-Taylor. Fast string matching using an n-gram algorithm. Software-Practice and Experience, 24(l):79-88, 1994.
[KTSA99] T. Kida, M. Takeda, A. Shinohara, and S. Arikawa. Shift-And approach to pattern matching in LZW compressed text. In Proc. 10th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1645, pages 1-13. Springer, 1999.
[KU96a] J. Kärkkäinen and E. Ukkonen. Lempel-Ziv parsing and sublinear-size index structures for string matching. In Proc. 3rd South American Workshop on String Processing, pages 141-155. Carleton University Press, 1996.
[KU96b] J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In Proc. 2ndAnnual International Conference on Computing and Combinatorics (COCOON), LNCS 1090, pages 219-230, 1996.
[Kuk92] K. Kukich. Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4):377-439, 1992.
[Lec07] T. Lecroq. Fast exact string matching algorithms. Information Processing Letters, 102(6):229-235, 2007.
[LNP05] K. Lemström, G. Navarro, and Y. Pinzon. Practical algorithms for transposition-invariant string-matching. Journal of Discrete Algorithms (JDA), 3(2-4):267-292, 2005.
[LS99] J. N. Larsson and K. Sadakane. Faster suffix sorting. Technical Report LU-CS-TR:99-214, LUNDFD6/(NFCS-3140)/1-20/(1999), Department of Computer Science, Lund University, Sweden, May 1999.
[LU00K. Lemström and E. Ukkonen. Including interval encoding into edit distance based music comparison and retrieval. In Proc. of Symposium on Creative & Cultural Aspects and Applications of AI & Cognitive Science, pages 53-60, 2000.
[LV86] G. M. Landau and U. Vishkin. Efficient string matching with k mismatches. Theoretical Computer Science, 43(2-3):239-249, 1986.
[LV89] G. M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, 10(2):157-169, 1989.
[LZ76] A. Lempel and J. Ziv. On the complexity of finite sequences. IEEE Transactions on Information Theory, 22:75-81, 1976.
[Mäk00] Mäkinen. Compact suffix array. In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1848, pages 305-319. Springer, 2000.
[Mäk03a] V. Mäkinen. Compact suffix array - a space efficient full-text index. Fundamenta Informaticae, 56(1-2):191-210, 2003. Special Issue - Computing Patterns in Strings.
[Mäk03b] V. Mäkinen. Parameterized approximate string matching and local-similarity-based point-pattern matching. PhD thesis, Department of Computer Science, University of Helsinki, August 2003.
[Man94] U. Manber. A text compression scheme that allows fast searching directly in the compressed file. In Proc. 5th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 807, pages 113-124. Springer, 1994.
[Man0l]G. Manzini. An analysis of the Burrows-Wheeler transform. The Journal of the ACM, 48(3):407-430, 2001. Prelim, version in SODA'99.
[Man04] M. A. Maniscalco. A solution for context based blocksort compression: The M03 algorithm, http://www.michael-maniscalco.com/ papers/m03.pdf, 2004.
[Mas27] H. V. Masters. A study of spelling errors. Ph. D. Thesis, University of Iowa, 1927. Unpublished.
[MB95] A. Moffat and T. A. H. Bell. In situ generation of compressed inverted files. Journal of the American Society for Information Science, 46(7):537-550, 1995.
[McC76] E. M. McCreight. A space-economical suffix tree construction algorithm. Journal of Algorithms, 23(2):262-272, 1976.
[Meh84] K. Mehlhorn. Data structures and algorithms 1: sorting and searching. Springer, 1984.
[Mel96] B. Melichar. String matching with k differences by finite automata. In Proc. 13th International Conference on Pattern Recognition, volume II., pages 256-260. IEEE Computer Society Press, 1996.
[MF04] G. Manzini and P. Ferragina. Engineering a lightweight suffix array construction algorithm. Algorithmica, 40(1):33-50, 2004.
[MHM+01] S. Mitarai, M. Hirao, T. Matsumoto, A. Shinohara, M. Takeda, and S. Arikawa. Compressed pattern matching for SEQUITUR. In Proc. Data Compression Conference (DCC), pages 469+, Snowbird, UT, 2001. IEEE Computer Society Press.
[Mil05] P. B. Miltersen. Lower bounds on the size of selection and rank indexes. In Proc. 16th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 11-12. SIAM, 2005.
[MM90] U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. In Proc. 1st ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 319-327. SIAM, 1990.
[MM91] G. Mehldau and G. Myers. A system for pattern matching applications on biosequences. Computer Applications in the Biosciences, 9(3):299-314, 1991.
[MM93] U. Manber and G. Myers. Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22(5):935-948, 1993.
[MN04a] V. Mäkinen and G. Navarro. Compressed compact suffix arrays. In Proc. 15th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 3109, pages 420-433. Springer, 2004.
[MN04b] V. Mäkinen and G. Navarro. New search algorithms and time/space tradeoffs for succinct suffix arrays. Technical Report C-2004-20, Department of Computer Science, University of Helsinki, 2004.
[MN05a] M. G. Maaß and J. Nowak. Text indexing with errors. Technical Report TUM-I0503, Fakultät für Informatik, TU München, mar 2005.
[MN05b] M. G. Maaß and J. Nowak. Text indexing with errors. In Proc. 16th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 3537, pages 21-32. Springer, 2005.
[MN05c] V. Mäkinen and G. Navarro. Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing, 12(l):40-66, 2005.
[MN07a] V. Mäkinen and G. Navarro. Implicit compression boosting with applications to self-indexing. In Proc. 14th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 4726, pages 214-226. Springer, 2007.
[MN07b] V. Mäkinen and G. Navarro. Rank and select revisited and extended. Theoretical Computer Science, 387(3) :332-347, 2007.
[MN08] V. Mäkinen and G. Navarro. Dynamic entropy-compressed sequences and full-text indexes. ACM Transactions on Algorithms (TALG), 4(3):article 32, 2008. 38 pages.
[MNF58] G. A. Miller, E. B. Newman, and E. A. Friedman. Length-frequency statistics for written English. Information and Control, 1:370-389, 1958.
[MNU03] V. Mäkinen, G. Navarro, and E. Ukkonen. Approximate matching of run-length compressed strings. Algorithmica, 35:347-369, 2003.
[MNU05] V. Mäkinen, G. Navarro, and E. Ukkonen. Transposition invariant string matching. Journal of Algorithms, 56(2):124-153, 2005.
[MNZ97] E. Moura, G. Navarro, and N. Ziviani. Indexing compressed text. In Proc. 4th South American Workshop on String Processing, pages 95-111. Carleton University Press, 1997.
[MNZBY00] E. Moura, G. Navarro, N. Ziviani, and R. A. Baeza-Yates. Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS), 18(2):113-139, 2000.
[Mof89] A. Moffat. Word-based text compression. Software-Practice and Experience, 19(2):185-198, 1989.
[MP80] W. J. Masek and M. S. Paterson. A faster algorithm for computing string edit distances. Journal of Computer and System Sciences, 20(1):18-31, 1980.
[MP08] M. A. Maniscalco and S. J. Puglisi. An efficient, versatile approach to suffix sorting. ACM Journal of Experimental Algorithmics (JEA), 12:1-23, 2008.
[MR97] J. I. Munro and V. Raman. Succinct representation of balanced parentheses, static trees and planar graphs. In Proc. 38th IEEE Annual Symposium on Foundations of Computer Science (FOGS), pages 118-126, 1997.
[Mun96] J. I. Munro. Tables. In Proc. 16th Conference on Foundations of Software Technology and Theoretical Computer Science, pages 37-42, London, UK, 1996. Springer.
[MW94] U. Manber and S. Wu. GLIMPSE: A tool to search through entire file systems. In Proc. USENIX Winter 1994 Technical Conference, pages 23-32, San Francisco, CA, 1994.
[Mye86] E. W. Myers. Incremental alignment algorithms and their applications. Report TR-90-25, Department of Computer Science, University of Arizona, Tucson, AZ, 1986.
[Mye96] E. W. Myers. Approximate matching of network expression with spacers. Journal of Computational Biology, 3(1):33-51, 1996.
[Mye98] G. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. In Proc. 9th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1448, pages 1-13. Springer, 1998.
[Mye99] G. Myers. A fast bit-vector algorithm for approximate string matching based on dynamic programming. The Journal of the ACM, 46(3):395-415, 1999.
[Nav98] G. Navarro. Approximate Text Searching. PhD thesis, Department of Computer Science, University of Chile, December 1998. ftp://ftp. dcc.uchile.cl/pub/users/gnavarro/thesis98.ps.gz.
[Nav01a] G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(l):31-88, 2001.
[Nav01b] G. Navarro. NR-grep: a fast and flexible pattern matching tool. Software-Practice and Experience, 31:1265-1312, 2001.
[Nav04] G. Navarro. Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms, 2(1):87-114, 2004.
[NC06] G. Navarro and E. Chávez. A metric index for approximate string matching. Theoretical Computer Science, 352(1-3) :266-279, 2006.
[NGMD05] G. Navarro, Sz. Grabowski, V. Mäkinen, and S. Deorowicz. Improved time and space complexities for transposition invariant string matching. Technical Report TR/DCC-2005-4, University of Chile, Department of Computer Science, 2005. ftp://ftp.dcc.uchile.cl/pub/users/ gnavarro/mnloglogs.ps.gz.
[NM07] G. Navarro and V. Mäkinen. Compressed full-text indexes. ACM Computing Surveys, 39(l):article 2, 2007.
[NMN+00] G. Navarro, E. Moura, M. Neubert, N. Ziviani, and R. Baeza-Yates. Adding compression to block addressing inverted indexes. Information Retrieval, 3(l):49-77, 2000.
[NMW97] C. G. Nevill-Manning and I. H. Witten. Compression and explanation using hierarchical grammars. The Computer Journal, 40(2/3): 103-116, 1997. \
[NR98] G. Navarro and M. Raffinot. A bit-parallel approach to suffix automata: Fast extended string matching. In Proc. 9th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1448, pages 14-33. Springer, 1998.
[NR00] G. Navarro and M. Raffinot. Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics, 5(4), 2000.
[NR02] G. Navarro and M. Raffinot. Flexible Pattern Matching in Strings - 1 Practical on-line search algorithms for texts and biological sequences. Cambridge University Press, 2002. ISBN 0-521-81307-7. 280 pages.
[NR03] G. Navarro and M. Raffinot. Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology, 10(6):903-923, 2003.
[NST05] G. Navarro, E. Sutinen, and J. Tarhio. Indexing text with approximate q-grams. Journal of Discrete Algorithms (JDA), 3(2-4): 157-175, 2005.
[NT00] G. Navarro and J. Tarhio. Boyer-Moore string matching over Ziv-Lempel compressed text. In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1848, pages 166-180, Montreal, Canada, 2000. Springer.
[OS07] D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. In Proc. Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 2007.
[Pag99] R. Pagh. Low redundancy in static dictionaries with O(1) worst case lookup time. In Proc. 26th Int. Colloquium on Automata, Languages and Programming (ICALP), LNCS 1644, pages 595-604. Springer, 1999.
[Păt08] M. Pătraşcu. Succincter. In FOCS, pages 305-313. IEEE Computer Society, 2008.
[Păt09] M. Pătraşcu. A lower bound for succinct rank queries.CoRR, abs/0907.1103, 2009.
[Pen03] Ch.-L. Peng. An approach for solving the constrained longest common subsequence problem. Master's thesis, Department of Computer Science and Engineering, National Sun Yat-sen University, Taiwan, 2003. http://etd.lib.nsysu.edu.tw/ETD-db/ETD-search/getfile?URN= etd-0828103-125439&filename=etd-0828103-125439.pdf.
[PGNS06] R. Przywarski, Sz. Grabowski, G. Navarro, and A. Salinger. FM-KZ: An even simpler alphabet-independent FM-Index. In Proc. 11th Prague Stringology Conference (PSC), pages 226-240, 2006.
[PS80] W. Paul and J. Simon. Decision trees and random access machines. In ZUERICH: Proc. Symp. Logik und Algorithmik, pages 331-340, 1980.
[PT03] H. Peltola and J. Tarhio. Alternative algorithms for bit-parallel string matching. In Proc. 10th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 2857, pages 80-94. Springer, 2003.
[PW05] Y. J. Pinzón and S. Wang. Simple algorithm for pattern-matching with bounded gaps in genomic sequences. In Proc. Int. Conference of Numerical Analysis and Applied Mathematics (ICNAAM), pages 827-831, 2005.
[Riv77] R. L. Rivest. On the worst case behavior of string searching algorithms. SIAM Journal on Computing, 6(4):669-674, 1977.
[RNO07] L. Russo, G. Navarro, and A. Oliveira. Approximate string matching with Lempel-Ziv compressed indexes. In Proc. 14th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 4726, pages 265-275. Springer, 2007.
[RNO08a] L. Russo, G. Navarro, and A. Oliveira. Dynamic fully-compressed suffix trees. In Proc. 19th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 5029, pages 191-203, 2008.
[RNO08b] L. Russo, G. Navarro, and A. Oliveira. Fully-compressed suffix trees. In Proc. 8th Latin American Symposium (LATIN), LNCS 4957, pages 362-373, 2008.
[RRR02] R. Raman, V. Raman, and S. S. Rao. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In Proc. 13th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 233-242. SIAM, 2002.
[RTT02] J. Rautio, J. Tanninen, and J. Tarhio. String matching with stopper encoding and code splitting. In Proc. 13th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 2373, pages 42-52. Springer, 2002.
[Rya80] B. Y. Ryabko. Data compression by means of a book stack. Problems of Information Transmission, 16(4):16-21, 1980.
[Ryt99] W. Rytter. Algorithms on compressed strings and arrays. In Proc. 26th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), LNCS 1725, pages 48-65. Springer, 1999.
[Sad00] K. Sadakane. Compressed text databases with efficient query algorithms based on the compressed suffix array. In Proc. 11th Int. Symposium on Algorithms and Computation (ISAAC), LNCS 1969, pages 410-421. Springer, 2000.
[Sad02] K. Sadakane. Succinct representations of 1cp information and improvements in the compressed suffix arrays. In Proc. 13th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 225-232. SIAM, 2002.
[Sad07] K. Sadakane. Compressed suffix trees with full functionality. Theoretical Computer Science, 41(4):589-607, 2007.
[Sch61] C. Schensted. Largest increasing and decreasing subsequences. Canadian Journal of Mathematics, 13:179-191, 1961.
[Sew06] J. Seward. The bzip2 program, http://www.bzip.org/, 2006.
[SG06] K. Sadakane and R. Grossi. Squeezing succinct data structures into entropy bounds. In Proc. 17th ACM-SIAM Annual Symposium on Discrete Algorithms (SODA), pages 1230-1239. ACM Press, 2006.
[SGD05] P. Skibiński, Sz. Grabowski, and S. Deorowicz. Revisiting dictionary-based compression. Software-Practice and Experience, 35(15):1455-1476, 2005.
[SGS06] J. Swacha, Sz. Grabowski, and P. Skibiński. Efektywna reprezentacja dokument6w XML (in Polish). In Badania Operacyjne i Systemowe (BOS 2006), vol. 3, pages 355-366, Szczecin, Poland, 2006. Akademicka Oficyna Wydawnicza EXIT.
[SGS07] P. Skibiński, Sz. Grabowski, and J. Swacha. Fast transform for effective XML compression. In Proc. 9th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics, pages 323-326, Polyana, Ukraine, 2007.
[SGS08] P. Skibiński, Sz. Grabowski, and J. Swacha. Effective asymmetric XML compression. Software-Practice and Experience, 38(10):1027-1047, 2008.
[Shk02] D. Shkarin. PPM: One step to practicality. In Proc. Data Compression Conference (DCC), pages 202-211, Snowbird, UT, 2002. IEEE Computer Society Press.
[Sim93] I. Simon. String matching algorithms and automata. In Proc. 1st South American Workshop on String Processing, pages 151-157, Universidade Federal de Minas Gerais, Brazil, 1993.
[Smi91] P. D. Smith. Experiments with a very fast substring search algorithm. Software-Practice and Experience, 21(10):1065-1074, 1991.
[SS01] K. Sadakane and T. Shibuya. Indexing huge genome sequences for solving various problems. Genome Informatics, 12:175-183, 2001.
[SS07] P. Skibiński and J. Swacha. Combining efficient XML compression with query processing. In Proc. 11th East European Conference on Advances in Databases and Information Systems (ADBIS), LNCS 4690, pages 330-342. Springer, 2007.
[SSG08] P. Skibiński, J. Swacha, and Sz. Grabowski. A highly efficient XML compression scheme for the Web. In Proc. 34th Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM), LNCS 4910, pages 766-777. Springer, 2008.
[ST07] P. Sanders and F. Transier. Intersection in integer inverted indices. In Proc. Workshop on Algorithm Engineering and Experiments (ALENEX), pages 71-83. SIAM, 2007.
[Ste92] G. A. Stephen. String search. Report TR-92-gas-Ol, University College of North Wales, 1992.
[STSA99] Y. Shibata, M, Takeda, A. Shinohara, and S. Arikawa. Pattern matching in text compressed by using antidictionaries. In Proc. 10th Annual Symposium on Combinatorial Pattern Matching (CPM), LNCS 1645, pages 37-49, Warwick University, UK, 1999. Springer.
[Sun90] D. M. Sunday. A very fast substring search algorithm. Communications of the ACM, 33(8):132-142, 1990.
[SZM03] W. Sun, N. Zhang, and A. Mukherjee. A dictionary-based multi-corpora text compression system. In Proc. Data Compression Conference (DCC), page 448, Snowbird, UT, 2003. IEEE Computer Society Press.
[Tis08] A. Tiskin. Semi-local string comparison: Algorithmic techniques and applications. Mathematics in Computer Science, 1(4):571-603, 2008.
[TMK+02] M. Takeda, S. Miyamoto, T. Kida, A. Shinohara, S. Fukumachi, T. Shinohara, and S. Arikawa. Processing text files as is: Pattern matching over compressed tests. In Proc. 9th Int. Symposium on String Processing and Information Retrieval (SPIRE), LNCS 2476, pages 170-186. Springer, 2002.
[TP97] J. Tarhio and H. Peltola. String matching in the DNA alphabet. Software-Practice and Experience, 27(7):851-861, 1997.
[TS08] F. Transier and P. Sanders. Intersection in integer inverted indices. In Proc. Workshop on Algorithm Engineering and Experiments (ALENEX), pages 3-12. SIAM, 2008.
[Tsa03] Y. T. Tsai. The constrained common subsequence problem. Information Processing Letters, 88:173-176, 2003.
[TSM+01] M. Takeda, Y. Shibata, T. Matsumoto, T. Kida, A. Shinohara, S. Fukamachi, T. Shinohara, and S. Arikawa. Speeding up string pattern matching by text compression: The dawn of a new era. Transactions of Information Processing Society of Japan, 42(3):370-384, 2001.
[Ukk85a] E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64(1-3):100-118, 1985.
[Ukk85b] E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6(1-3): 132-137, 1985.
[Ukk95] E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249-260, 1995.
[vEBKZ77] P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Mathematical Systems Theory, 10:99-127, 1977.
[Vig08] S. Vigna. Broadword implementation of rank/select queries. In Proc. 7th Workshop on Efficient and Experimental Algorithms (WEA), LNCS 5038, pages 154-168. Springer, 2008.
[Wan03] R. Wan. Browsing and Searching Compressed Documents. Ph. D. Thesis, University of Melbourne, Australia, 2003.
[WC76] C. K. Wong and A. K. Chandra. Bounds for the string editing problem. The Journal of the ACM, 23(1):13-16, 1976.
[Wei73] P. Weiner. Linear pattern matching algorithm. In Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1-11, Washington, DC, 1973.
[WF74] R. A. Wagner and M. Fischer. The string-to-string correction problem. The Journal of the ACM, 21(1):168-173, 1974.
[WM92a] S. Wu and U. Manber. Agrep - a fast approximate pattern-matching tool. In Proc. USENIX Winter 1992 Technical Conference, pages 153-162, San Francisco, CA, 1992.
[WM92b] S. Wu and U. Manber. Fast text searching allowing errors. Communications of the ACM, 35(10):83-91, 1992.
[WM94] S. Wu and U. Manber. A fast algorithm for multi-pattern searching. Report TR-94-17, Department of Computer Science, University of Arizona, Tucson, AZ, 1994.
[WMB99] I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999.
[Yao79] A. C. Yao. The complexity of pattern matching for a random string. SIAM Journal on Computing, 8(3):368-387, 1979.
[YC08] I.-H. Yang and Y.-C. Chen. Fast algorithms for the constrained longest increasing subsequence problems. In Proc. 25th Workshop on Combinatorial Mathematics and Computation Theory, pages 226-231, 2008.
[Zec72] E. Zeckendorf. Representation des nombres naturels par une somme de nombres de Fibonacci ou de nombres Lucas. Bulletin de la Société Royale des Sciences de Liege, 41:179-182, 1972.
[ZL77] J. Ziv and A. Lempel. A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 23(3) :337-343, 1977.
[ZL78] J. Ziv and A. Lempel. Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5):530-536, 1978.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-LOD6-0008-0005