PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Najbardziej znane korpusy tekstów : opracowanie przeglądowe

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
EN
Most popular text corpora : the survey
Języki publikacji
PL
Abstrakty
PL
Niniejszy raport opisuje najbardziej znane korpusy tekstów języka naturalnego. Wpierw analizowane są zasady konstruowania korpusu, czyli doboru składających się nań tekstów w zależności od przyjetego rozmiaru oraz określenia jego struktury. Następnie prezentowane są najbardziej znane korpusy, głównie anglojęzyczne, lecz także innych języków europejskich: francuskiego, niemieckiego, rosyjskiego i czeskiego. Szczególną uwagę poświęcono dwum korpusom polskim - Korpusowi IPI PAN oraz Narodowemu Korpusowi Języka Polskiego. Oddzielny rozdział poświęcony jest bankom drzew, czyli korpusom znakowanym syntaktycznie.
EN
The present report describes the most famous corpora of natural language texts. First, the rules of corpora construction are analysed, namely, determining its structure and selecting texts to be included in the corpus. Next, the most popular corpora are presented. The majority of them are English corpora, but corpora of other European languages: French, German, Czech and Russian are considered as well. The special attention is paid to two Polish corpora: the IPI PAN Corpus and the National Corpus of Polish. The separate section is devoted to treebanks, i.e., corpora that are syntactically annotated.
Rocznik
Tom
Strony
1--56
Opis fizyczny
Bibliogr. 117 poz.
Twórcy
autor
  • Instytut Podstaw Informatyki PAN, ul. Ordona 21, 01-237 Warszawa, Polska
Bibliografia
  • A. Abeille (red.) (2003) Treebanks: Building and Using Parsed Corpora, Kluwer Academic Publishers, Dordrecht, Holandia.
  • A. Abeille, L. Clement (1999) A tagged reference corpus for French, w: Proce¬edings of the LINC'99 Workshop at EACL'99, Bergen, Norwegia.
  • A. Abeille, L. Clement, R. Reyes (1998) TALANA annotated corpus for French: the first results, w: LREC (1998), s. 992-999.
  • A. Abeille, L. Clement, F. ois Toussenel (2003) Building a Treebank for French, w: Abeille (2003).
  • ACL (1998) Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 11th International Conference on Computa¬tional LinguisticsCOLING-ACL'98, Montreal, Kanada.
  • J. Apresjan, I. Boguslavsky, L. Iomdin, B. Iomdin, A. Sannikov, V. Sizov (2006) A Syntactically and Semantically Tagged Corpus of Russian: State of the Art and Prospects, w: Proceedings of the 5th International Conference on Langu¬age Resources and Evaluation (LREC-2006), s. 1378-1381, Genua, Włochy.
  • J. Apresjan, I. Boguslavsky, L. Iomdin, A. Lazursky, V. Sannikov, V. Sizov, L. Tsinman (2003) ETAP-3 linguistic processor: A full-fledged NLP imple¬mentation of the MTT, w: Proceedings of the 1st International Conference on Meaning-Text Theory, s. 279-288.
  • J. Apresjan, I. Boguslavsky, L. Iomdin, A. Lazursky, V. Sannikov, L. Tsinman (1992) The Linguistics of a Machine Translation System, Meta, t. 37, nr 1, s. 97-112.
  • G. Aston, L. Burnard (1998) The BNC Handbook: Exploring the British Na¬tional Corpus with SARA, Edinburgh University Press, Edynburg, Wielka Brytania.
  • S. Atkins (1991) Tools for computer-aided corpus lexicography: The Hector pro¬ject, Acta Linguistica Hungarica, t. 41, s. 5-72.
  • C. F. Baker, C. J. Fillmore, B. Cronin (2003) The structure of the FrameNet database, International Journal of Lexicography, t. 16, nr 3, s. 281-296.
  • C. F. Baker, C. J. Fillmore, J. B. Lowe (1998) The Berkeley FrameNet Project, w: ACL (1998), s. 86-90.
  • M. Bahko (red.) (2000) Inny słownik języka polskiego, Wydawnictwo Naukowe PWN, Warszawa.
  • E. Bejček, P. Möllerova, P. Straňiak (2006) Lexico-Semantic Annotation of PDT: Some Results, Problems and Solutions, w: P. Sojka, I. Kopecek, K. Pala (red.), Proceedings of the 9th International Conference on Text, Speech and Dialogue, t. 4188 serii Lecture Notes in Artificial Intelligence., s. 21-28, Springer-Verlag, Brno, Czechy.
  • D. Biber (1988) Variation across speech and writing, Cambridge University Press, Nowy Jork, NY.
  • —   (1995) Dimension of register variation, Cambridge University Press, Nowy Jork, NY.
  • —   (2007) Representativeness in corpus design, w: Teubert i Krishnamurthy (2007), s. 134-165.
  • H.  Bickel, M. Gasser, A. H. Buhofer, L. Hofer, C. Schon (2009) Schweizer Text Korpus — Theoretische Grundlagen, Korpusdesign und Abfragemoglichkeiten, Linguistik online, t. 39, nr 3.
  • I. Boguslavsky, I. Chardin, S. Grigorjeva, N. Grigoriev, L. Iomdin, L. Kreidlin, N. Frid (2002) Development of a dependency treebank for Russian and its possible applications in NLP, w: LREC (2002), s. 852-856.
  • I. Boguslavsky, S. Grigorjeva, N. Grigorjev, L. Kreidlin, N. Frid (2000) Depen¬dency treebank for Russian: Concept, tools, types of information, w: COLING (2000), s. 987-991.
  • A. Böhmova, E. Hajičova, J. Hajič, B. Hladka (2003) The Prague Dependency Treebank: A three-level annotation scenario, w: Abeille (2003).
  • S. Brants, S. Dipper, S. Hansen, W. Lezius, G. Smith (2002) The TIGER Tre¬ebank, w: Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria.
  • S. Brants, S. Hansen (2002) Developments in the TIGER Annotation Scheme and their Realization in the Corpus, w: LREC (2002), s. 1643-1649.
  • T. Brants (1997a) Internal and External Tagsets in Part-of-Speech Tagging, w: Proceedings of Eurospeech, s. 2787-2790, Rodos, Grecja.
  • —    (1997b) The NEGRA Export Format, CLAUS Report 98, Universitat des Saarlandes, Computerlinguistik, Saarbrucken, Niemcy.
  • —    (2000) Inter-Annotator Agreement for a German Newspaper Corpus, w: LREC (2000), s. 1435-1439.
  • T. Brants, O. Plaehn (2000) Interactive Corpus Annotation, w: LREC (2000), s. 453-459.
  • T. Brants, W. Skut (1998) Automation of Treebank Annotation, w: Proceedings of New Methods in Language Processing NeMLaP-98, s. 49-57, Sydney, Australia.
  • T. Brants, W. Skut, H. Uszkoreit (1999) Syntactic Annotation of a German Newspaper Corpus, w: Proceedings of the ATALA Treebank Workshop, s. 69-76, Paryż, Francja.
  • —  (2003) Syntactic Annotation of a German Newspaper Corpus, w: Abeille (2003).
  • E. Brill (1993) A Corpus-Based Approach to Language Learning, Rozprawa doktorska, University of Pennsylvania.
  • B. Broda, M. Piasecki, A. Radziszewski (2008) Towards a set of general purpose morphosyntactic tools for Polish, w: M. A. Klopotek, A. Przepiórkowski, S. T. Wierzchoń (red.), Proceedings of the Intelligent Information Systems XVI (IIS'08), Challenging Problems in Science: Computer Science, s. 441-450, Akademicka Oficyna Wydawnicza Exit, Zakopane.
  • L. Burnard (2007) Where did we go wrong? A retrospective look at British National Corpus, w: Teubert i Krishnamurthy (2007), s. 35-54.
  • T. By (2009) The TiGer Dependency bank in Prolog Format, w: M. A. Klopo¬tek, A. Przepiórkowski, S. T. Wierzchoń, K. Trojanowski (red.), Recent Advances in Intelligent Information Systems, Challenging Problems in Science: Computer Science, s. 119-129, Akademicka Oficyna Wydawnicza Exit, Warszawa.
  • J. B. Carroll, P. Davies, B. Richman (1971) The American Heritage Word Frequency Book, American Heritage Publishing Co., Nowy Jork, NY.
  • D. Cavar, A. Geyken, G. Neumann (2000) Digital Dictionary of the 20th Century German Language, w: T. Erjavec, J. Gros (red.), Proceedings of the Language Technologies Conference, Ljubljana, Słowenia.
  • F. Čermak (1997) Czech National Corpus: A case Study in Many Contexts, International Journal of Corpus Linguistics, t. 2, s. 181-197.
  • —  (1998) Czech National Corpus: Its Character, Goal and Background, w: Sojka et al. (1998), s. 9-14.
  • —  (2001) Language Corpora: The Czech Case, w: Matoušek et al. (2001), s. 21-30.
  • K. Church (1988) A stochastic parts program and noun phrase parser for unrestricted text, w: Proceedings of the 2nd ACL Conference on Applied Natural Language Processing (ANLP-88), s. 136-143, Austin, TX.
  • L. Clement, A. Kinyon (2000) Chunking, marking and searching a morpho-syntactically annotated corpus for French, w: Proceedings ACIDCA'2000, Monastir, Tunezja.
  • COLING (2000) Proceedings of the 18th International Conference on Computational Linguistics (COLING-2000), Saarbrücken, Niemcy.
  • M. J. Collins, J. Hajic, E. Brill, L. Ramshaw, C. Tillmann (1999) A Statistical Parser of Czech, w: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99), s. 397-404, College Park, MA.
  • S. Dipper (2000) Grammar-based Corpus Annotation, w: A. Abeille, T. Brants, H. Uszkoreit (red.), Proceedings of the Second Workshop on Linguistically Interpreted Corpora (LINC), s. 56-64, Luksemburg.
  • Ł. Dębowski (2003) A reconfigurable stochastic tagger for languages with complex tag structure, w: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL-2003), s. 63-70, Budapeszt, Węgry.
  • —  (2004) Trigram morphosyntactic tagger for Polish, w: M. A. Klopotek, S. T. Wierzchoń, K. Trojanowski (red.), Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS:IIPWM'04, Advances in Soft Computing, s. 409-413, Springer-Verlag, Zakopane.
  • T. Erjavec (2001) The MULTEXT-East Resources Revisited, ElsNews, t. 10, nr 3-2.
  • T. Erjavec, C. Krstev, V. Petkevič, K. Simov, M. Tadić, D. Vitas (2003) The MULTEXT-East Morphosyntactic Specifications for Slavic Languages, w: Proceedings of the EACL 2003 Workshop on Morphological Processing of Slavic Languages, s. 25-32, Budapeszt, Węgry.
  • C. Fellbaum (red.) (1998) WordNet — An Electronic Lexical Database, MIT Press, Cambridge, MA.
  • M. Filipenko, E. Paducheva, E. Rakhilina (1992) Semantic dictionary viewed as a lexical database, w: Proceedings of the l4th International Conference on Computational Linguistics (COLIN G-1992), s. 1295-1299, Nantes, Francja.
  • C. J. Fillmore, N. Ide, D. Jurafsky, C. Macleod (1998) An American National Corpus: A Proposal, w: LREC (1998), s. 965-969.
  • C. J. Fillmore, C. R. Johnson, M. R. L. Petruck (2003) Background to Frame-Net, International Journal of Lexicography, t. 16, nr 3, s. 235-250.
  • C. J. Fillmore, C. Wooters, C. F. Baker (2001) Building a large lexical databank which provides deep semantics, w: B. K. Tsou, O. Y. Kwong (red.), Proceedings of the Pacific Asian Conference on Language, Information and Computation, s. 3-25, Hong Kong.
  • M. Forst, N. Bertomeu, B. Crysmann, F. Fouvry, S. Hansen-Schirra, V. Cordoni (2004) Towards a dependency-based gold standards of German parsers - The TiGer Dependency Bank, w: Proceedings of the COLING Workshop on Linguistically Interpreted Corpora, s. 31-37, Genewa, Szwajcaria.
  • W. N. Francis (2007) Problems of assembling and computerizing large corpora, w: Teubert i Krishnamurthy (2007), s. 285-298.
  • W. N. Francis, H. Kucera (1964, wersja poprawiona 1979) Brown Corpus Manual, Internet.
  • R. Garside (1996) The Robust Tagging of Unresticted Text: the BNC experience, w: J. Thomas, M. Short (red.), Using Corpora for Language for Language Research: Studies in Honour of Geofrey Leech, s. 167-180, Longman, Harlow.
  • E. Grishina, E. Rakhilina (2005) Russian National Corpus (RNC): an overview and perspectives, w: Proceedings of the AATSEEL 2005.
  • K. Głowińska, A. Przepiórkowski (2010) The Design of Syntactic Annotation Levels in the National Corpus of Polish, w: LREC (2010).
  • J. Hajič (1998) Building a Syntactically Annotated Corpus, w: E. Hajicova (red.), Issues of Valency and Meaning, s. 106-132, Charles University, Praga, Czechy.
  • — (2005) Complex Corpus Annotation: The Prague Dependency Treebank, w: M. Šimkova (red.), Insight into Slovak and Czech Corpus Linguistics, s. 54-73, Veda, Bratysława, Słowacja.
  • J. Hajič, P. Krbec, P. Kvĕtoň, K. Oliva, V. Petkevič (2001) Serial Combinations of Rules and Statistics: A Case Study in Czech Tagging, w: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL'01), s. 260-267, Tuluza, Francja.
  • E. Hajičova (1998) The Prague Dependency Treebank: From Analytic to Tectogrammatical Annotation, w: Sojka et al. (1998), s. 45-50.
  • —  (1999) The Prague Dependency Treebank: Crossing the Sentence Boundary, w: V. Matoušek, P. Mautner, J. Ocelikova, P. Sojka (red.), Proceedings of the 2nd International Workshop on Text, Speech and Dialogue, s. 20-27, Springer-Verlag, Berlin.
  • E. Hajičova, J. Hajič, B. Hladka, P. Pajas, V. Řeznickova, P. Sgall (2001) The Current Status of the Prague Dependency Treebank, w: Matousek et al. (2001), s. 11-20.
  • E. Hajičova, B. Hladka (1998) Tagging Inflective Languages: Prediction of Morphological Categories for a Rich, Structured Tagset, w: ACL (1998), s. 483-490.
  • E. Hajičova, B. H. Partee, P. Sgall (1998) Topic-Focus Articulation, Tripartite Structures and Semantic Content, t. 71 serii Studies in Linguistics and Philosophy, Kluwer Academic Publishers, Dordrecht, Holandia.
  • D. Hindle (1989) Acquiring disambiguation rules from text, w: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL'89), s. 118-125, Vancouver, Kanada.
  • N. Ide (1998a) Corpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora, w: LREC (1998), s. 463-470.
  • —  (1998b) Encoding Linguistic Corpora, w: Proceedings of the Sixth Workshop on Very Large Corpora, s. 9-17.
  • N. Ide, P. Bonhomme, L. Romary (2000) XCES: AN XML-based Encoding Standard for Linguistic Corpora, w: LREC (2000), s. 825-830.
  • N. Ide, R. Reppen, K. Suderman (2002) The American National Corpus: More Than the Web Can Provide, w: LREC (2002), s. 839-844.
  • N. Ide, K. Suderman (2004) The American National Corpus First Release, w: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), s. 1681-1684, Lisbon, Portugal.
  • S. Johansson, G. N. Leech, H. Goodluck (1978) Manual of information to accompany the Lancaster- Oslo/Bergen Corpus of British English, for use with digital computers, Department of English, University of Oslo, Oslo, Norwegia.
  • A. Kinyon (2000) Shallow parsing French using function words as triggers, Rap. tech.
  • E. König, W. Lezius (2000) A description language for syntactically annotated corpora, w: COLING (2000), s. 1056-1060.
  • — (2003) The TIGER language — A Description Language for Syntax Graphs, Formal Definition, Rap. tech., Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart.
  • E. König, W. Lezius, H. Voormann (2003) TIGERSearch User's Manual, Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, Stuttgart, Niemcy.
  • W. Kopaliński (1968) Słownik wyrazów obcych i zwrotów obcojęzycznych, Wiedza Powszechna, Warszawa.
  • H. Kucera, W. N. Francis (1967) Computational Analysis of Present-Day American English, Brown University Press, Providence, RI.
  • I. Kurcz, A. Lewicki, J. Sambor, K. Szafran, J. Woronczak (red.) (1990) Słownik frekwencyjny języka polskiego, Instytut Języka Polskiego PAN, Kraków.
  • G. Kustova, O. Lashevskaja, E. Rakhilina, E. Paducheva (2007) On Taxonomy in Cognitive Semantics and Corpus Linguistics: Parts of Body, w: Proceedings of the 10th International Cognitive Conference, Kraków.
  • L. Kučova, Z. Žabokrtsky (2005) Anaphora in Czech: Large Data and Experiments with Automatic Anaphora Resolution, w: Matoušek et al. (2005), s. 93-98.
  • O. Lashevskaja (2006) Corpus-aided Construction Grammar: Semantic Tools in the Russian National Corpus, w: Proceedings of the 2th International Meeting of the German Cognitive Linguistic Association, Monachium, Niemcy.
  • W. Lezius, E. König (2000) Towards a search engine for syntactically annotated corpora, w: W. Zühlke, E. G. Schukat-Talamazzini (red.), Konvens 2000 Sprachkommunikation, s. 113-116, VDE-Verlag, Ilmenau, Niemcy.
  • M. Liberman (1989) Text on Tap: the ACL Data Collection Initiative, w: Proceedings of the DARPA Workshop on Speech and Natural Language, s. 173-188, Morgan Kaufmann.
  • LREC (1998) Proceedings of the 1st International Conference on Language Resources and Evaluation (LREC-1998), Grenada, Hiszpania.
  • —  (2000) Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC-2000), Athens, Greece.
  • —  (2002) Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Hiszpania.
  • —  (2010) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC-2010), ELRA, Valetta, Malta.
  • H.-D. Maas (1996) MPRO: Ein System zur Analyse und Synthese deutscher Wörter, w: R. Hausser (red.), Linguistische Verifikation, Sprache und Information, 34, Max Niemeyer Verlag, Tybinga, Niemcy.
  • —  (1998) Multilinguale Textverarbeitung mit MPRO, w: Proceedings of the Europäische Kommunikationskybernetik heute und morgen, Paderborn, Niemcy.
  • H.-D. Maas, C. Rösener, A. Theofilidis (2009) Morphosyntactic and Semantic Analysis of Text: The MPRO Tagging Procedure, w: C. Mahlow, M. Piotrowski (red.), State of the art in computational morphology. Proceedings of the Workshop on systems and frameworks for computational morphology (SFCM 2009), t. 41 serii Communications in computer and information science, s. 76—87, Springer-Verlag.
  • M. Marciniak (red.) (2010) Anotowany korpus dialogów telefonicznych, Problemy Współczesnej Nauki. Teoria i Zastosowania: Inżyniera Lingwistyczna, Akademicka Oficyna Wydawnicza Exit, Warszawa.
  • M. P. Marcus (1994) The Penn TreeBank: A revised corpus design for extracting predicate-argument structure, w: Proceedings of the ARPA Human Language Technology Workshop, Morgan Kaufmann, Princeton, NJ.
  • M. P. Marcus, B. Santorini, M. A. Marcinkiewicz (1993) Building a Large Annotated Corpus of English: The Penn Treebank, Computational Linguistics, t. 19, nr 2, s. 313-330.
  • V. Matouśek, P. Mautner, R. Moućek, K. Tauser (red.) (2001) Proceedings of the 4th International Conference on Text, Speech and Dialogue, t. 2166 serii Lecture Notes in Artificial Intelligence, Springer-Verlag, Zelezna Ruda, Czechy.
  • V. Matouśek, P. Mautner, T. Pavelka (red.) (2005) Proceedings of the 8th International Conference on Text, Speech and Dialogue, t. 3658 serii Lecture Notes in Artificial Intelligence, Springer-Verlag, Karlovy Vary, Czechy.
  • I. Mel'čuk (1988) Dependency Syntax: Theory and Practice, State University of New York Press, Albany, NY.
  • I. Mel'čuk, A. Zholkovsky (1984) Explanatory Combinatorial Dictionary of Modern Russian, Wienner Slawistischer Almanach, Wiedeń, Austria.
  • A. Mengel, W. Lezius (2000) An XML-based encoding format for syntactically annotated corpora, w: LREC (2000), s. 121-126.
  • G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, K. J. Miller (1990) Introduction to WordNet: an on-line lexical database, International Journal of Lexicography, t. 3, nr 4, s. 235-244.
  • G. A. Miller, C. Leacock, R. Tengi, R. Bunker (1993) A semantic concordance, w: Proceedings of the A RPA Human Language Technology Workshop, s. SOS-SOS, Plainsboro, NJ.
  • S. M. Newman, R. W. Swanson, K. Knowlton (1959) A Notation System for Transliterating Technical and Scientific Texts for Use in Data Processing Systems, Rap. tech. 15, U.S. Department of Commerce.
  • J. Nivre, I. Boguslavsky, L. Iomdin (2008) Parsing the SynTagRus Treebank, w: Proceedings of the 22nd International Conference on Computational Linguistics (COLING-2008), s. 641-648, Manchester, Wielka Brytania.
  • P. Pajas, J. Štepanek (2005) A Generic XML-based Format for Structured Linguistic Annotation and its Application to Prague Dependency Treebank 2.0, Rap. tech. TR-2005-29, UFAL MFF UK, Praga, Czechy.
  • M. Piasecki, B. Gaweł (2005) A Rule-based Tagger for Polish Based on Genetic Algorithm, w: M. A. Kłopotek, S. T. Wierzchoń, K. Trojanowski (red.), Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS:IIPWM'05, Advances in Soft Computing, s. 247-256, Springer-Verlag, Gdańsk.
  • O. Plaehn, T. Brants (2000) Annotate - An Efficient Interactive Annotation Tool, w: Proceedings of the 6th ACL Conference on Applied Natural Language Processing (ANLP-2000), Seattle, WA.
  • A. Przepiórkowski (2004) Korpus IPI PAN. Wersja wstępna, Instytut Podstaw Informatyki, Polska Akademia Nauk, Warszawa.
  • — (2009) A comparison of two morphosyntactic tagsets of Polish, w: V. Koseska-Toszewa, L. Dimitrova, R. Roszko (red.), Proceedings of the 4th MON-DILEX Open Workshop on Representing Semantics in Digital Lexicography, s. 138-144.
  • A. Przepiórkowski, P. Bański, Ł. Dębowski, E. Hajnicz, M. Woliński (2003) Konstrukcja korpusu IPI PAN, Polonica, t. XXII-XXII, s. 33-38.
  • A. Przepiórkowski, R. L. Górski, B. Lewandowska-Tomaszczyk, M. Łaziński (2008) Towards the National Corpus of Polish, w: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-2008), ELRA, Marrakech, Morocco.
  • A. Przepiórkowski, R. L. Górski, M. Łaziński, P. Pęzik (2009) Recent developments in the National Corpus of Polish, w: J. Levicka, R. Grabik (red.), Proceedings of the 5th International Conference on NLP, Corpus Linguistics, Corpus Based Grammar Research (Slovko 2009), s. 302-309.
  • —   (2010) Recent Developments in the National Corpus of Polish, w: LREC (2010).
  • A. Przepiórkowski, G. Murzynowski (2009) Manual Annotation of the National Corpus of Polish with Anotatornia, w: S. Goźdź-Roszkowski (red.), Practical Applications in Language Corpora (PALC'09), Peter Lang, Frankfurt nad Menem.
  • A. Przepiórkowski, M. Woliński (2003a) A Flexemic Tag set for Polish, w: Proceedings of the Workshop of Morphological Processing of Slavic Languages, EACL-2003, s. 33-40.
  • —  (2003b) A Morpho syntactic Tagset for Polish, w: P. Kosta, J. Błaszczak, J. Frasek, L. Geist, M. Zygis (red.), Investigations into Formal Slavic Linguistics, s. 349-362, Peter Lang.
  • M. Razimova, Z. Žabokrtsky (2005) Morphological Meanings in the Prague Dependency Treebank 2.0, w: Matoušek et al. (2005), s. 148-155.
  • R. Reppen, N. Ide (2004) The American National Corpus: Overall goals and the first release, Journal of English Linguistics, t. 32, nr 2, s. 105-113.
  • B. Santorini, M. A. Marcinkiewicz (1991) Bracketing Guidelines for the Penn Treebank Project, Rap. tech., Department of Computer and Information Science, University of Pennsylvania.
  • A. Savary, J. Piskorski (2009) Lexicons and Grammars for Named Entity Annotation in the National Corpus of Polish, w: M. A. Kłopotek, M. Marciniak, A. Mykowiecka, W. Penczek, S. T. Wierzchoń (red.), Intelligent Information Systems, Challenging Problems in Science: Computer Science, s. 141-154, Akademicka Oficyna Wydawnicza Exit, Warszawa.
  • A. Savary, J. Waszczuk, A. Przepiórkowski (2010) Towards the Annotation of Named Entities in the National Corpus of Polish, w: LREC (2010).
  • P. Sgall, E. Hajičova, J. Panevova (1986) The Meaning of the Sentence in Its Semantic and Pragmatic Aspects, D. Reidel, Dordrecht, Holandia.
  • S. Sharoff (2004) Methods and tools for development of the Russian Reference Corpus, w: A. Wilson, D. Archer, P. Rayson (red.), Corpus Linguistics Around the World, t. 56 serii Language and Computers. Studies in Practical Linguistics, s. 167-180, Rodopi, Amsterdam, Holandia.
  • W. Skut, T. Brants, B. Krenn, H. Uszkoreit (1998) A Linguistically Interpreted Corpus of German Newspaper Text, w: Proceedings of the ESSLLI Workshop on Recent Advances in Corpus Annotation, s. 705-711, Saarbriicken, Niemcy.
  • W. Skut, B. Krenn, T. Brants, H. Uszkoreit (1997) An Annotation Scheme for Free Word Order Languages, w: Proceedings of the 5th ACL Conference on Applied Natural Language Processing (ANLP-97), s. 88-96, Washington, DC.
  • P. Smrż (2004) Quality Control for Wordnet Development, w: P. Sojka, K. Pala, P. Smrż, C. Fellbaum, P. Vossen (red.), Proceedings of the 2nd International WordNet Conference (GWC 2004), s. 206-212, Masaryk University, Brno, Czechy.
  • P. Sojka, V. Matoušek, P. Mautner, K. Pala, I. Kopeček (red.) (1998) Proceedings of the 1st International Workshop on Text, Speech and Dialogue, Masaryk University, Brno, Czechy.
  • M. Spevack (1968-70) Complete and Systematic Concordance to the works of Shakespeare, G. Olms, Hildesheim, Niemcy.
  • — (1972) Shakespeare English: The Core Vocabulary, RNL, t. 3, nr ii, s. 106-122.
  • J. Stein (red.) (1967) The Random House Dictionary of the English Language, Random House, Nowy Jork, NY.
  • M. Szupryczyńska (1973) Syntaktyczna klasyfikacja czasowników przybiernikowych, Państwowe Wydawnictwo Naukowe, Poznań.
  • A. Taylor, M. P. Marcus, B. Santorini (2003) The Penn Ttrebank: An Overview, w: Abeille (2003), s. 5-22.
  • TEI P5 (2008) TEI P5: Guidelines for Electronic Text Encoding and Interchange, Internet.
  • W. Teubert, R. Krishnamurthy (red.) (2007) Corpus Linguistics, Critical Concepts in Linguistics, Routlege, Abington, Wielka Brytania; Nowy Jork, NY.
  • M. Woliński (2006) Morfeusz — a Practical Tool for the Morphological Analysis of Polish, w: M. A. Kłopotek, S. T. Wierzchoń, K. Trojanowski (red.), Proceedings of the Intelligent Information Systems New Trends in Intelligent Information Processing and Web Mining IIS.TIPWM'06, Advances in Soft Computing, s. 503-512, Springer-Verlag, Ustron.
  • A. Zalizniak (1977) Grammaticzeskij slovar' russkogo jazyka, Russkij Jazyk, Moskwa, Rosja.
  • H. Zinsmeister, J. Kuhn, S. Dipper (2001a) From LFG Structures to TIGER Treebank Annotations, w: Proceedings of the Third Workshop on Linguistically Interpreted Corpora (LINC 2001), Leuven, Belgia.
  • H. Zinsmeister, J. Kuhn, B. Schrader, S. Dipper (2001b) TIGER Transfer — From LFG Structures to the TIGER Treebank, Rap. tech., Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart.
  • G. K. Zipf (1935) The Psycho-Biology of Language: An Introduction to Dynamic Philology, Houghton Mifflin, Boston, MA.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUJ8-0024-0070
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.