Wyniki wyszukiwania - BazTech

1

A French corpus annotated for multiword expressions and named entities

Candito Marie, Constant Mathieu, Ramisch Carlos, Savary Agata, Guillaume Bruno, Parmentier Yannick, Cordeiro Silvio Ricardo

Journal of Language Modelling

|

2020

|

Vol. 8, No. 2

415--479

EN

We present the enrichment of a French treebank of various genres with a new annotation layer for multiword expressions (MWEs) and named entities (NEs).1 Our contribution with respect to previous work on NE and MWE annotation is the particular care taken to use formal criteria, organized into decision flowcharts, shedding some light on the interactions between NEs and MWEs. Moreover, in order to cope with the well-known difficulty to draw a clear-cut frontier between compositional expressions and MWEs, we chose to use sufficient criteria only. As a result, annotated MWEs satisfy a varying number of sufficient criteria, accounting for the scalar nature of the MWE status. In addition to the span of the elements, annotation includes the subcategory of NEs (e.g., person, location) and one matching sufficient criterion for non-verbal MWEs (e.g., lexical substitution). The 3,099 sentences of the treebank were double-annotated and adjudicated, and we paid attention to cross-type consistency and compatibility with the syntactic layer. Overall inter-annotator agreement on non-verbal MWEs and NEs reached 71.1%. The released corpus contains 3,112 annotated NEs and 3,440 MWEs, and is distributed under an open license.

2

Design and analysis of a lean interface for Sanskrit corpus annotation

Goyal P., Huet G.

Journal of Language Modelling

|

2016

|

Vol. 4, No. 2

145--182

EN

We describe an innovative computer interface designed to assist annotators in the efficient selection of segmentation solutions for proper tagging of Sanskrit corpora. The proposed solution uses a compact representation of the shared forest of all segmentations. The main idea is to represent the union of all segmentations, abstracting from the sandhi rules used, and aligning with the input sentence. We show that this representation provides an exponential saving, in both space and time. The segmentation methodology is lexicon-directed. When the lexicon does not have full coverage of the corpus vocabulary, some chunks of the input may fail to be recognized. We designed a lexiconacquisition facility, which remedies this incompleteness and makes the interface more robust. This interface has been implemented, and is currently being applied to the annotation of the Sanskrit Library corpus. Evaluation over 1,500 sentences from the Pañcatantra text shows the effectiveness of the proposed interface on real corpus data.

3

The method of automatic summarization from different sources

Shakhovska N., Cherna T.

ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes

|

2016

|

Vol. 5, No 1

103--109

EN

In this article is analyzed technology of automatic text abstracting and annotation. The role of annotation in automatic search and classification for different scientific articles is described. The algorithm of summarization of natural language documents using the concept of importance coefficients is developed. Such concept allows considering the peculiarity of subject areas and topics that could be found in different kinds of documents. Method for generating abstracts of single document based on frequency analysis is developed. The recognition elements for unstructured text analysis are given. The method of pre-processing analysis of several documents is developed. This technique simultaneously considers both statistical approaches to abstracting and the importance of terms in a particular subject domain. The quality of generated abstract is evaluated. For the developed system there was conducted experts evaluation. It was held only for texts in Ukrainian. The developed system concluding essay has higher aggregate score on all criteria. The summarization system architecture is building. To build an information system model there is used CASE-tool AllFusion ERwin Data Modeler. The database scheme for information saving was built. The system is designed to work primarily with Ukrainian texts, which gives a significant advantage, since most modern systems still oriented to English texts.

4

Java SAM Typed Closures : A Sound and Complete Type Inference System for Nominal Types

Bellia M., Occhiuto M.E.

Fundamenta Informaticae

|

2013

|

Vol. 128, nr 1-2

17--33

EN

The last proposal for Java closures, as emerged in JSR 000335, is mainly innovative in: (1) Use of nominal types, SAM types, for closures; (2) Introduction of target types and compatibility for a contextual typing of closures; (3) Need for a type inference that reconstructs the omitted type annotations of closures and closure arguments. The paper provides a sound and complete type system, with nominal types, for such a type inference and discusses role and formalization of targeting and of compatibility in the designed inference process.

5

KIS: An automated attribute induction method for classification of DNA sequences

Biedrzycki R., Arabas J.

International Journal of Applied Mathematics and Computer Science

|

2012

|

Vol. 22, no. 3

711-721

EN

This paper presents an application of methods from the machine learning domain to solving the task of DNA sequence recognition. We present an algorithm that learns to recognize groups of DNA sequences sharing common features such as sequence functionality. We demonstrate application of the algorithm to find splice sites, i.e., to properly detect donor and acceptor sequences. We compare the results with those of reference methods that have been designed and tuned to detect splice sites. We also show how to use the algorithm to find a human readable model of the IRE (Iron-Responsive Element) and to find IRE sequences. The method, although universal, yields results which are of quality comparable to those obtained by reference methods. In contrast to reference methods, this approach uses models that operate on sequence patterns, which facilitates interpretation of the results by humans.

6

Automatyczna anotacja genomu jako narzędzie biologii systemów

Bizukojć M.

Inżynieria i Aparatura Chemiczna

|

2009

|

Nr 3

25-27

PL

W pracy przedstawiono metodę analizy metabolizmu organizmów polegającą na rekonstrukcji sieci metabolicznej na podstawie całkowicie lub częściowo zsekwencjonowanego genomu. Analizę tę przeprowadzono dla siedmiu gatunków grzybów nitkowych z rodzaju Aspergillus wykorzystując serwer automatycznej anotacji, a jej wyniki porównano z wybranymi danymi fizjologicznymi.

EN

A method based upon the reconstruction of fully or partially sequenced genome to analyse metabolic networks of organisms is presented. This analysis was performed for seven fungal species of genus Aspergillus with the use of automatic annotation server. The results were compared with selected physiological data.