Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!

Znaleziono wyników: 5

Liczba wyników na stronie
first rewind previous Strona / 1 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  corpus linguistics
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 1 next fast forward last
1
Content available remote Re-research.pl : where Humanities Meet Computer Science
EN
The article discusses selected projects from the field of digital humanities realised by the Re-research.pl group. The group consists of researchers from the Institute of Linguistics and the Department of Natural Language Processing at Adam Mickiewicz University, Pozna´n, Poland. The projects discussed include National Photocorpus of Polish, Discovermat, Korea, Koreans and ‘Koreanity’ in the digitised Polish press of the 20th century, Biography of the Nation, 100,000 ministories, Gonito.net and 50,000 words. Domain and chronologisation index. However, the main focus of the article is the interdisciplinary popular-scientific blog Re-research.pl. The daily blog posts include texts on a variety of subjects, ranging from linguistics, history and folklore to computer science. Selected posts and categories of posts are discussed, such as chronologisational challenges, texts devoted to folklore and materials on the structure of text files. Apart from providing daily analyses, the blog promotes other projects and serves as a dialogue platform for representatives of various fields.
2
Content available remote Korpusomat : a Tool for Creating Searchable Morphosyntactically Tagged Corpora
EN
The paper presents Korpusomat, a web application aimed at building annotated corpora for the purpose of corpus linguistic studies. Korpusomat combines existing tools, such as morphological analyser, tagger and corpus search engine, and provides an easy-to-use environment for building corpora technically compatible with the National Corpus of Polish from almost any text, including texts in binary formats. In the paper we present the current state of the project, its features and functionalities, as well as some future plans and developments tasks. A usage example is also presented.
3
Content available remote Extraction of Polish noun senses from large corpora by means of clustering
EN
We investigate two methods of identifying noun senses, based on clustering of lemmas and of documents. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on very large Polish corpora. The evaluation by means of a WordNet-based synonymy test used Polish wordnet (plWordNet 1.0). Various clustering algorithms were analysed for the needs of extraction of document clusters as indicators of the senses of words which occur in them. The two approaches to wordsense identification have been compared, and conclusions drawn.
EN
We present a method for the structural collocation extraction for an inflective language (Polish) based on the process divided into two phases: (1) extraction and filtering of the pairs of lemmatised wordforms and (2) structural annotation of the extracted collocations with lexico-syntactic patterns. The pattern templates and parameters are specified manually but their instances are both generated and tested on the corpus automatically. The extracted collocations were evaluated by applying them as rules in morphosyntactic disambiguation of Polish and by comparing them with a list of two-word expressions extracted from two Polish dictionaries.
EN
We present a direct method to construct a morpho-syntactic guesser for Polish. Such a guesser produces morpho-syntactic descriptions for word forms unknown to the morphological analyser. The method relies upon a statistical a tergo index, in which pseudosuffixes (endings) extracted from a statistical tree define morpho-syntactic properties of corresponding word forms. The secondary aim is to investigate to what extent it is possible to develop the morphological analysis exclusively on the basis of endings. A statistically extracted a tergo index of Polish word forms was created. Various experiments giving insights into the properties of the index are presented. The method seems to be easily applicable to any other inflectional language with only minor technical changes.
first rewind previous Strona / 1 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.