PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!
  • Sesja wygasła!
Tytuł artykułu

Simple Random Access Compression

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence. Our simplest solution uses at most 2h+o(h) bits of space, where h = n(H_o(S)+1), and H_o(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. For example, we can achieve n(H_k(S)+1)+o(n(H_k(S)+1)) bits of space, for k = o(logσ(n)). Several applications are discussed, including text compression, (compressed) full-text indexing and string matching.
Wydawca
Rocznik
Strony
63--81
Opis fizyczny
Bibliogr. 39 poz., tab., wykr.
Twórcy
autor
Bibliografia
  • [1] Amir, A., Benson, G.: Two-dimensional periodicity and its applications, Proceedings of SODA'92, 1992.
  • [2] Baeza-Yates, R. A., Gonnet, G. H.: A new approach to text searching, Commun. ACM, 35(10), 1992, 74-82.
  • [3] Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An Efficient Compression Code for Text Databases, Proceedings of ECIR'03, LNCS 2633, 2003.
  • [4] Brown, J. L.: Zeckendorf's Theorem and Some Applications, Fib. Quart., 2, 1964, 163-168.
  • [5] Brown, J. L.: A New Characterization of the Fibonacci Numbers, Fib. Quart., 3, 1965, 1-8.
  • [6] Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994.
  • [7] Clark, D. R.: Compact Pat Trees, Ph.D. Thesis, University of Waterloo, Ontario, Canada, 1998.
  • [8] Elias, P.: Universal codeword sets and representation of the integers, IEEE Transactions on Information Theory, 21(2), 1975, 194-203.
  • [9] Ferragina, P., Manzini, G.: Opportunistic data structures with applications, Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), IEEE Computer Society, Washington, DC, USA, 2000.
  • [10] Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds, Proc. 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'07), SIAM, 2007.
  • [11] Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds., Theor. Comput. Sci., 372(1), 2007, 115-121.
  • [12] Fredriksson, K.: Shift-or string matching with super-alphabets, Information Processing Letters, 87(1), 2003, 201-204.
  • [13] Fredriksson, K., Nikitin, F.: Compressed representation of sequences with constant time random access, Technical Report A-2006-3, Department of Computer Science and Statistics, University of Joensuu, 2006.
  • [14] Fredriksson, K., Nikitin, F.: Simple compression code supporting random access and fast string matching, Proc. 6th Workshop on Efficient and Experimental Algorithms (WEA'07), LNCS 4525, Springer-Verlag, 2007.
  • [15] Golynski, A., Munro, I., Rao, S. S.: Rank/select operations on large alphabets: a tool for text indexing, Proceedings of SODA'06, ACM Press, 2006.
  • [16] González, R., Navarro, G.: Statistical Encoding of Succinct Data Structures, Proceedings of the 17th Annual Symposium on Combinatorial Pattern Matching (CPM 2006), LNCS 4009, 2006.
  • [17] Grabowski, S., Navarro, G., Przywarski, R., Salinger, A., Mäkinen, V.: A Simple Alphabet-Independent FM-Index, International Journal of Foundations of Computer Science (IJFCS), 17(6), 2006, 1365-1384.
  • [18] Heaps, H. S.: Information retrieval: theoretical and computational aspects, Academic Press, New York, NY, 1978.
  • [19] Horspool, R. N.: Practical fast searching in strings, Softw. Pract. Exp., 10(6), 1980, 501-506.
  • [20] Huffman, D. A.: A method for the construction of minimum redundancy codes, Proceedings of I.R.E, 40, 1951, 1098-1101.
  • [21] Jacobson, G.: Succinct static data structures, Ph.D. Thesis, Carnegie Mellon University, 1989.
  • [22] Kim, D. K., Na, J. C., Kim, J. E., Park, K.: Efficient Implementation of Rank and Select Functions for Succinct Representation, Proceedings of WEA, 3503, Springer, 2005.
  • [23] Mäkinen, V., Navarro, G.: Rank and Select Revisited and Extended, Theoretical Computer Science, 387(3), 2007, 332-347, Special issue on "The Burrows-Wheeler Transform and its Applications".
  • [24] Manber, U.: A Text Compression Scheme that Allows Fast Searching Directly in the Compressed File, ACM Trans. Inform. Syst., 15(2), 1997, 124-136.
  • [25] Manber, U.,Myers, G.: Suffix arrays: a newmethod for on-line string searches, SIAMJournal on Computing, 22(5), 1993, 935-948.
  • [26] Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and FlexibleWord Searching on Compressed Text, ACM Transactions on Information Systems (TOIS), 18(2), 2000, 113-139.
  • [27] Munro, J. I.: Tables., Proceedings of FSTTCS'96, LNCS 1180, Springer, 1996.
  • [28] Navarro, G., Mäkinen, V.: Compressed Full-Text Indexes, ACM Computing Surveys, 39(1), 2007, article 2.
  • [29] Navarro, G., Raffinot, M.: Fast and Flexible String Matching by Combining Bit-Parallelism and Suffix Automata, ACM Journal of Experimental Algorithmics (JEA), 5(4), 2000.
  • [30] Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary, Proceedings of ALENEX'07, ACM Press, 2007.
  • [31] Pagh, R.: Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time, Proceedings of ICALP'99, Springer-Verlag, 1999.
  • [32] Raman, R., Raman, V., Rao, S. S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets, Proceedings of SODA'02, ACM Press, 2002.
  • [33] Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds., Proceedings of SODA'06, ACM Press, 2006.
  • [34] Sunday, D. M.: A very fast substring search algorithm, Commun. ACM, 33(8), 1990, 132-142.
  • [35] Weiner, P.: Linear pattern matching algorithm, Proc. 14th Annual IEEE Symposium on Switching and Automata Theory,Washington, DC, 1973.
  • [36] Witten, I. H., Moffat, A., Bell, T. C.: Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishers, San Francisco, CA, 1999, ISBN 1-55860-570-3.
  • [37] Witten, I. H., Neal, R.M., Cleary, J. G.: Arithmetic coding for data compression. Commun. ACM, 30(6):520, Communications of the ACM, 30(6), 1987, 520.
  • [38] Yao, A. C.: The complexity of pattern matching for a random string, SIAM J. Comput., 8(3), 1979, 368-387.
  • [39] Ziv, J., Lempel, A.: A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory, 23, 1977, 337-343.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS8-0004-0064
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.