Czasopismo
2009
|
Vol. 92, nr 1-2
|
63-81
Tytuł artykułu
Autorzy
Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Języki publikacji
Abstrakty
Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence. Our simplest solution uses at most 2h+o(h) bits of space, where h = n(H_o(S)+1), and H_o(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. For example, we can achieve n(H_k(S)+1)+o(n(H_k(S)+1)) bits of space, for k = o(logσ(n)). Several applications are discussed, including text compression, (compressed) full-text indexing and string matching.
Czasopismo
Rocznik
Tom
Strony
63-81
Opis fizyczny
Bibliogr. 39 poz., tab., wykr.
Twórcy
autor
autor
- Department of Computer Science University of Kuopio P.O. Box 1627, 70211 Kuopio, Finland, kimmo.fredriksson@uku.fi
Bibliografia
- [1] Amir, A., Benson, G.: Two-dimensional periodicity and its applications, Proceedings of SODA'92, 1992.
- [2] Baeza-Yates, R. A., Gonnet, G. H.: A new approach to text searching, Commun. ACM, 35(10), 1992, 74-82.
- [3] Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An Efficient Compression Code for Text Databases, Proceedings of ECIR'03, LNCS 2633, 2003.
- [4] Brown, J. L.: Zeckendorf's Theorem and Some Applications, Fib. Quart., 2, 1964, 163-168.
- [5] Brown, J. L.: A New Characterization of the Fibonacci Numbers, Fib. Quart., 3, 1965, 1-8.
- [6] Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994.
- [7] Clark, D. R.: Compact Pat Trees, Ph.D. Thesis, University of Waterloo, Ontario, Canada, 1998.
- [8] Elias, P.: Universal codeword sets and representation of the integers, IEEE Transactions on Information Theory, 21(2), 1975, 194-203.
- [9] Ferragina, P., Manzini, G.: Opportunistic data structures with applications, Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), IEEE Computer Society, Washington, DC, USA, 2000.
- [10] Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds, Proc. 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'07), SIAM, 2007.
- [11] Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds., Theor. Comput. Sci., 372(1), 2007, 115-121.
- [12] Fredriksson, K.: Shift-or string matching with super-alphabets, Information Processing Letters, 87(1), 2003, 201-204.
- [13] Fredriksson, K., Nikitin, F.: Compressed representation of sequences with constant time random access, Technical Report A-2006-3, Department of Computer Science and Statistics, University of Joensuu, 2006.
- [14] Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching, Proc. 6th Workshop on Efficient and Experimental Algorithms (WEA'07), LNCS 4525, Springer-Verlag, 2007.
- [15] Golynski, A., Munro, I., Rao, S. S.: Rank/select operations on large alphabets: a tool for text indexing, Proceedings of SODA'06, ACM Press, 2006.
- [16] González, R., Navarro, G.: Statistical Encoding of Succinct Data Structures, Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM 2006), LNCS 4009, 2006.
- [17] Grabowski, S., Navarro, G., Przywarski, R., Salinger, A., Mäkinen, V.: A Simple Alphabet-Independent FM-Index, International Journal of Foundations of Computer Science (IJFCS), 17(6), 2006, 1365-1384.
- [18] Heaps, H. S.: Information retrieval: theoretical and computational aspects, Academic Press, New York, NY, 1978.
- [19] Horspool, R. N.: Practical fast searching in strings, Softw. Pract. Exp., 10(6), 1980, 501-506.
- [20] Huffman, D. A.: A method for the construction of minimum redundancy codes, Proceedings of I.R.E, 40, 1951, 1098-1101.
- [21] Jacobson, G.: Succinct static data structures, Ph.D. Thesis, Carnegie Mellon University, 1989.
- [22] Kim, D. K., Na, J. C., Kim, J. E., Park, K.: Efficient Implementation of Rank and Select Functions for Succinct Representation, Proceedings of WEA, 3503, Springer, 2005.
- [23] Mäkinen, V., Navarro, G.: Rank and Select Revisited and Extended, Theoretical Computer Science, 387(3), 2007, 332-347, Special issue on "The Burrows-Wheeler Transform and its Applications".
- [24] Manber, U.: A Text Compression Scheme that Allows Fast Searching Directly in the Compressed File, ACM Trans. Inform. Syst., 15(2), 1997, 124-136.
- [25] Manber, U.,Myers, G.: Suffix arrays: a newmethod for on-line string searches, SIAMJournal on Computing, 22(5), 1993, 935-948.
- [26] Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and FlexibleWord Searching on Compressed Text, ACM Transactions on Information Systems (TOIS), 18(2), 2000, 113-139.
- [27] Munro, J. I.: Tables., Proceedings of FSTTCS'96, LNCS 1180, Springer, 1996.
- [28] Navarro, G., Mäkinen, V.: Compressed Full-Text Indexes, ACM Computing Surveys, 39(1), 2007, article 2.
- [29] Navarro, G., Raffinot, M.: Fast and Flexible String Matching by Combining Bit-Parallelism and Suffix Automata, ACM Journal of Experimental Algorithmics (JEA), 5(4), 2000.
- [30] Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary, Proceedings of ALENEX'07, ACM Press, 2007.
- [31] Pagh, R.: Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time, Proceedings of ICALP'99, Springer-Verlag, 1999.
- [32] Raman, R., Raman, V., Rao, S. S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets, Proceedings of SODA'02, ACM Press, 2002.
- [33] Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds., Proceedings of SODA'06, ACM Press, 2006.
- [34] Sunday, D. M.: A very fast substring search algorithm, Commun. ACM, 33(8), 1990, 132-142.
- [35] Weiner, P.: Linear pattern matching algorithm, Proc. 14th Annual IEEE Symposium on Switching and Automata Theory,Washington, DC, 1973.
- [36] Witten, I. H., Moffat, A., Bell, T. C.: Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishers, San Francisco, CA, 1999, ISBN 1-55860-570-3.
- [37] Witten, I. H., Neal, R.M., Cleary, J. G.: Arithmetic coding for data compression. Commun. ACM, 30(6):520, Communications of the ACM, 30(6), 1987, 520.
- [38] Yao, A. C.: The complexity of pattern matching for a random string, SIAM J. Comput., 8(3), 1979, 368-387.
- [39] Ziv, J., Lempel, A.: A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory, 23, 1977, 337-343.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS8-0004-0064