Compressing annotated natural language text

Swacha, J.

Artykuł - szczegóły

Tytuł artykułu

Compressing annotated natural language text

Autorzy

Swacha J.

Wybrane pełne teksty z tego czasopisma

https://journals.pan.pl/acs/

Identyfikatory

Warianty tytułu

Konferencja

Human Language Technologies as a challenge for Computer Science and Linguistics (2; 21-23.04.2005; Poznań, Poland)

Języki publikacji

Abstrakty

The paper is devoted to description and evaluation of a new method of linguistically annotated text compression. A semantically motivated transcoding scheme is proposed in which text is split into three distinct strems of data. By applying the scheme it is possible to reduce compressed text length by as high as 67%, compared to the initial compression algorithm. An important advantage of the method is the feasibility of processing text in its compressed form.

Słowa kluczowe

text compression text transcoding tagged text POS tags

Wydawca

Polish Academy of Sciences, Committee of Automatic Control and Robotics

Czasopismo

Archives of Control Sciences

Rocznik

2005

Tom

Vol. 15, no. 4

Strony

673--680

Opis fizyczny

Bibliogr. 10 poz., tab.

Twórcy

autor

Swacha J.

Szczecin University, Institute of Information Technology in Management, Mickiewicza 64, 71-101 Szczecin, jakubs@uoo.univ.szczecin.pl

Bibliografia

[1] J. CHeney: Compressing XML with Multiplexed Hierarchical PPM Models. In Proc. IEEE Data Compression Conf., Snowbird. Utah. IEEE Computer Society, (2001). 163-172.
[2] J. Davies, D. Fensel and F. van Harmelen (EDS): Towards the Semantic-Web: Ontology-driven knowledge management. Chichester, John Wiley & Sons, 2003.
[3] R. N. Horspool and G. V. Cormack: Constructing word-based text compression algorithms. In Proc. of IEEE Data Compression Conf., Snowbird. Utah. IEEE Computer Society, (1992), 62-71.
[4] D. A. Huffman: A Method for the Construction of Minimum Redundancy Codes. In Proc. IRE, 40 (1951), 1098-1101.
[5] K. Nagao: Digital Content Annotation and Transcoding. Boston, Massachusetts, Artech House, 2003.
[6] R. Richardson and A. F. Smeaton: Using WordNet in a Knowledge Based Approach to Information Retrieval. Working Paper. CA0395, School of Computer Applications. Dublin City University, 1995.
[7] B. Santorini: PartofSpeech Tagging Guidelines for the Penn Treebank Project. Technical Report. MS-CIS-90-47, Department of Computer and Information Science. University of Pennsylvania, 1990.
[8] P. Skibiński, Sz. Grabowski and S. Deorowicz: Revisiting dictionary-based compression. To appear in Software. Practice and Experience, (2005).
[9] W. J. Teahan and J. G. Cleary: Tag Based Models of English Text. In Proc. IEEE Data Compression Conf. Snowbird. Utah. IEEE Computer Society, (1998), 43-52.
[10] P. M. Tolani and J. R. Haritsa: XGrind: A Query-friendly XML Compressor. In Proc. 18th IEEE Int. Conf. on Data Engineering. San Jose, California, IEEE Computer Society. (2002).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BSW3-0021-0038