Wyniki wyszukiwania - BazTech

1

Development of a flexible Mizar tokenizer and parser for information retrieval system

Nakasho Kazuhisa

Annals of Computer Science and Information Systems

|

2019

|

Vol. 18

77--80

EN

In this paper, we explain the development of a new Mizar tokenizer and parser program as a component of a search system that works on the Mizar Mathematical Library. The existing Mizar tokenizer and parser can handle only an article as a whole written in the Mizar language, however, the newly developed program can deal with a snippet of a Mizar article. In particular, since it is possible to handle a snippet of an article without specifying a vocabulary section of an environment part, it is expected that user input efforts will be greatly reduced.

2

Expressibility of many-valued linguistics constructions

Frankowski S.

Journal of Applied Computer Science

|

2015

|

Vol. 23, nr 2

7--19

EN

The main problem of this paper is a comparison between different kinds of many-valued and two valued linguistics constructions. We are trying to approximate probabilistic grammars by using ones which contain rational numbers only. Moreover, it is shown that probabilistic grammar can be simulated by regularly controlled grammar.

3

Ewolucyjne wnioskowanie gramatyczne.

Unold O.

Prace Naukowe Instytutu Informatyki, Automatyki i Robotyki Politechniki Wrocławskiej. Monografie

|

2006

|

Vol. 29, nr 105

3-227

PL

W monografii została podjęta ważna i płodna zarówno teoretycznie, jak i praktycznie tematyka wnioskowania gramatycznego (maszynowego uczenia gramatyk). Zaproponowano nowy model ewolucyjnego wnioskowania gramatycznego, którego zasadniczym przeznaczeniem jest indukcja gramatyki bezkontekstowej. Konstrukcja nowego modelu ewolucyjnego wykorzystuje mechanizm uczenia stosowany w uczących się systemach klasyfikujących. W modelu klasyfikatorami są produkcje gramatyki bezkontekstowej podane w postaci normalnej Chomsky'ego, natomiast otoczeniem, do którego adaptuje się system, jest zbiór uczący składający się z przykładowych zdań opatrzonych etykietą określającą przynależność lub brak przynależności zdania do poszukiwanego języka. Celem uczenia jest poprawna klasyfikacja zdań uczących. Ponieważ zbiór klasyfikatorów tworzy zestaw produkcji gramatyki, poprawna klasyfikacja etykietowanych zdań oznacza wyidukowanie poszukiwanej gramatyki języka. Model śledzi produkcje użyte podczas analizy zbioru uczącego i po jej zakończeniu oblicza funkcję dopasowania każdej produkcji. Nowe produkcje gramatyki są odkrywane podczas procesu indukcji przez mechanizm pokrycia oraz algorytm genetyczny. W pracy można wyodrębnić dwie części. Pierwsza część pracy wprowadza w tematykę wnioskowania gramatycznego, ewolucyjnego przetwarzania oraz uczących się systemów klasyfikujących. W szczególności zaprezentowano aktualny stań badań w zakresie indukcji gramatyki bezkontekstowej, nowy sposób kategoryzacji uczących się systemów klasyfikujących oraz ich podstawowe modele w jednolitym ujęciu. W drugiej części pracy zaproponowano oryginalny model ewolucyjnego wnioskowania gramatycznego, dedykowany indukcji gramatyki bezkontekstowej. Architekturę i działanie nowego modelu opisano, posługując się kategoriami uczącego się systemu klasyfikującego. Wprowadzono tzw. mechanizm płodności produkcji, który wraz z mechanizmem ścisku oraz operatorem genetycznym inwersji ma przeciwdziałać wysokiej epistazie populacji produkcji modelu. Zdefiniowano nowe operatory pokrycia dostosowane do użytej metody parsowania oraz estymatory dokładności i kosztu indukcji. Przeprowadzono indukcję języków regularnych z tzw. zbioru Tomity, wybranych formalnych języków bezkontekstowych, a także obszernych korpusów językowych. Eksperymenty wykazały, że model uzyskuje dla każdej z badanych klas języka wyniki porównywalne z najlepszymi ze znanych w literaturze przedmiotu, i to nie tylko wśród metod ewolucyjnych, a w wielu wypadkach lepsze. Przeprowadzono badania symulacyjne modelu, których celem było eksperymentalne stwierdzenie własności proponowanego modelu ewolucyjnego. Poza wnioskami szczegółowymi osiągnięto również interesujące wyniki dotyczące ogólnych mechanizmów ewolucyjnych, jak wpływ selekcji turniejowej i ścisku na nacisk selektywny czy rola nowego operatora pokrycia pełnego w procesie ewolucji populacji uczącego się systemu klasyfikującego. Wskazano na jedno z możliwych praktycznych zastosowań modelu, poza badanym już w monografii obszarem inżynierii lingwistycznej, jakim jest genomika obliczeniowa. Rozpatrywano zadanie rozpoznawania sekwencji telomerowej u człowieka oraz poszukiwania regionu promotorowego u bakterii E. coli. Model w obecnej implementacji może być zastosowany na wysokim poziomie estymatora swoistości do rozpoznawania regionów nienależących do sekwencji promotorowych

EN

The monograph takes up an important and prolific, both theoretically and practically, subject of grammatical inference (grammar induction). A new model of evolutionary grammatical inference has been proposed, and its main purpose is context-free grammar induction. The structure of the new evolutionary model utilizes a learning mechanism applied in learning classifier systems. Here the classifiers are productions of context-free grammar presented in the Chomsky normal form, and the environment to which the system adapts is a learning set composed of the exemplary sentences. These sentences are labeled to distinguish a collocation or lack thereof to the searched language. The puipose of learning is a correct classification of learning sentences. Since the set of classifiers constitutes a grammar production unit, the correct classification of labeled sentences denotes inducing a grammar. The model monitors the productions used during the learning set analysis and after the analysis calculates a fitness function of each production. New grammar productions are discovered during the induction by a covering and a genetic algorithm. The thesis is divided into two parts. The first part introduces us into the area of grammatical inference, evolutionary processing, and learning classifier systems. In particular, the state of the art in the research in context-free grammar induction has been presented, a new categorization method of learning classifier systems has been proposed, and their generic models have been introduced in a uniform depiction. The second part proposes an original model of evolutionary grammatical inference, dedicated to context-free grammar induction. The architecture and operation of the new model have been described with the use of the categories of a learning classifier system. So called production fertility mechanism has been introduced, which together with a crowding mechanism and inversion operator is supposed to counteract the high epistasis in production population. New covering operators adapted to the applied parsing method and estimators of induction accuracy and cost have been defined. The induction conducted includes regular languages from the Tomita set, chosen formal context-free languages, and large natural language corpora. These experiments show4cd that for each class of the language examined the model obtains results which are comparable with, and in some cases even better than, the best known in the literature of the subject, and not only among the evolutionary methods. Computer simulations have been conducted to experimentally identify qualities of the proposed evolutionary model. In addition to detailed conclusions, interesting results concerning general evolutionary mechanisms have been obtained, including an influence of tournament selection and of crowding on selective pressure, or the role of a new full covering operator in the evolution process of the population of a learning classifier system. Except for the area of linguistic engineering examined in this thesis, one of the potential applications of the model has been pointed to, which is computational genomics. The issues of an identification of human telomer sequence and of searching for a E. coll promoter region have also been investigated. The model in its current implementation can be applied at a high level of the specificity estimator to recognize regions not belonging to promoter sequences.

4

Measure of regular languages

Surana A., Ray A.

Demonstratio Mathematica

|

2004

|

Vol. 37, nr 2

485--503

EN

This paper reviews and extends the recent work on signed real measure of regular languages within a unified framework. The language measure provides total ordering of partially ordered sets of sublanguages of a regular language to allow quantitative evaluation of the controlled behavior of deterministic finite state automata under different supervisors. The paper presents a procedure by which performance of different supervisors can be evaluated based on a common quantitative tool. Two algorithms are provided for computation of the language measure and their equivalence is established along with a physical interpretation from the probabilistic perspective.

5

Quantitative considerations on finding the shortest descriptions for meaningful symbolic sequences

Dębowski Ł.

Prace Instytutu Podstaw Informatyki Polskiej Akademii Nauk

|

2001

|

Nr 924

1-36

EN

The notes provide elements of a new quantitive theory for unsupervised learning from pragmatic language communication. It is argued that the suitable quantitive inference framework free from paradoxes should be based on minimum description lenght (MDL) interpreted as a simplified algorithmic complexity rather than on classical frequwntist probability. Furthermore, it is argued that recently observed non-extensivity of entropy in meaningful symbolic sequences can arise if and only if unsupervised acquisition of the MDL theories for these sequences produces infinite theories and when the unsupervised acquisition is optimal as well. Such result shakes rigorously the belief that a finite formal theory of natural language could be constructed by hands of any experts. On the other hand, unsupervised machine learning is pointed out as a feasible and the only right way to implementing language competence into Ais. From this perspective, a promising compression-learning algorithm by de Marcken, its efficiency and its extension are discussed. Important parallels with research in cognitive science and statistical physics are pointed out, as well. Thus, the notes may be interesting not only for computer scientists and linguists but also for other statistical and symbolic theorists.

PL

W niniejszych notatkach przedstawiono elementy nowej, ilościowej teorii uczenia bez nadzoru na podstawie pragmatycznej komunikacji językowej. Podano argumenty wskazujące na to, że odpowiedni formalizm wnioskowania ilościowego wolny od paradoksów powinien bazować na minimalnej długości opisu jako uproszczonej mierze złożoności algorytmicznej, a nie na prawdopodobieństwie jako klasycznej mierze częstości. Pokazano także, że niedawno zaobserwowana nieekstensywność entropii niepustych semantycznie ciągów symboli zachodzi wtedy i tylko wtedy, gdy teorie najkrótszych opisów dla tych ciągów mogą rosnąć nieskończenie, a także wtedy, gdy uczenie bez nadzoru zachodzi maksymalnie efektywnie. Rezultat ten w sposób ścisły podważa przekonanie, że skończona formalna teoria języka naturalnego może być podana przez jakiegokolwiek specjalistę. Z drugiej strony, wynik ten ukazuje maszynowe uczenie bez nadzoru jako perspektywicznie realizowalny a zarazem jedyny właściwy sposób implementowania kompetencji językowej w sztucznej inteligencji. Z tego względu przeprowadzono dyskusję obiecującego algorytmu uczenia opartego na kompresji, podanego przez de Marckena. Rozważono wstępnie możliwe rozszerzenia tego algorytmu. Ponieważ przedstawiono istotnie powiązania pomiędzy omawianymi kwestiami a bieżącymi badaniami w kognitywistyce i fizyce statystycznej, niniejsze notatki mogą zainteresować nie tylko informatyków i lingwistów, ale także innych teoretyków zajmujących się naukami statystycznymi i symbolicznymi.