Wyniki wyszukiwania - BazTech

1

Evaluating syntactic proposals using minimalist grammars and minimum description length

Ermolaeva Marina

Journal of Language Modelling

|

2023

|

Vol. 11, No. 1

67--119

EN

Many patterns found in natural language syntax have multiple pos-sible explanations or structural descriptions. Even within the cur-rently dominant Minimalist framework (Chomsky 1995, 2000), it is not uncommon to encounter multiple types of analyses for the same phenomenon proposed in the literature. A natural question, then, is whether one could evaluate and compare syntactic proposals from a quantitative point of view. In this paper, we show how an evaluation measure inspired by the minimum description length principle (Rissa-nen 1978) can be used to compare accounts of syntactic phenomena implemented as minimalist grammars (Stabler 1997), and how argu-ments for and against this kind of analysis translate into quantitative differences.

2

Simplicity and learning to distinguish arguments from modifiers

Bergen Leon, Gibson Edward, O'Donnell Timothy J.

Journal of Language Modelling

|

2022

|

Vol. 10, No. 2

241--286

EN

We present a learnability analysis of the argument-modifier distinction, asking whether there is information in the distribution of English constituents that could allow learners to identify which constituents are arguments and which are modifiers. We first develop a general description of some of the ways in which arguments and modifiers differ in distribution. We then identify two models from the literature that can capture these differences, which we call the argument-only model and the argument-modifier model. We employ these models using a common learning framework based on two simplicity biases which tradeoff against one another. The first bias favors a small lexicon with highly reusable lexical items, and the second, opposing, bias favors simple derivations of individual forms – those using small numbers of lexical items. Our first empirical study shows that the argument-modifier model is able to recover the argument-modifier status of many individual constituents when evaluated against a gold standard. This provides evidence in favor of our general account of the distributional differences between arguments and modifiers. It also suggests a kind of lower bound on the amount of information that a suitably equipped learner could use to identify which phrases are arguments or modifiers. We then present a series of analyses investigating how and why the argument-modifier model is able to recover the argument-modifier status of some constituents. In particular, we show that the argumentmodifier model is able to provide a simpler description of the input corpus than the argument-only model, both in terms of lexicon size, and in terms of the complexity of individual derivations. Intuitively, the argument-modifier model is able to do this because it is able to ignore spurious modifier structure when learning the lexicon. These analyses further support our general account of the differences between arguments and modifiers, as well as our simplicity-based approach to learning.

3

Against strict headedness in syntax

Lichte Timm

Journal of Language Modelling

|

2021

|

Vol. 9, No. 2

291--348

EN

Strict headedness is a common idealization in the structural analysis of linguistic entities, particularly in syntax. This contribution takes a critical look at its premises and applications by demonstrating the surprising sloppiness of both the defining concepts and the test procedures, and by showing how strict headedness is nevertheless implemented as an important axiom into virtually all mainstream grammar formalisms. Subsequently, I present a non-trivial head-agnostic analysis based on Tree Unification & Constraints (TUCO) in order to show that there actually is a choice and that strict headedness can be avoided in principle.

4

Trying to Understand PEG

Redziejowski R. R.

Fundamenta Informaticae

|

2018

|

Vol. 157, nr 4

463--475

EN

Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. Its properties are useful in many applications, but it is not well understood as a language definition tool. In its appearance, PEG is almost identical to a grammar in the Extended Backus-Naur Form (EBNF), and one may expect it to define the same language. But, due to the limited backtracking, PEG may reject some strings defined by EBNF, which gives an impression of PEG being unpredictable. We note that for some grammars, the limited backtracking is “efficient”, in the sense that it exhausts all possibilities. A PEG with efficient backtracking should therefore be easy to understand. There is no general algorithm to check if the grammar has efficient backtracking, but it can be often checked by inspection. The paper outlines an interactive tool to facilitate such inspection.

5

Characteristics and decomposition of expressions in the PF-notation

Balcer M.

Silesian Journal of Pure and Applied Mathematics

|

2016

|

Vol. 6, iss. 1

5--22

EN

The paper presents a selected aspects of classical parenthesis-free notation. With the introduction of the concepts of the pattern of expression and the characteristics were obtained convenient tools for classiﬁcation and decomposition of expressions in the PF-notation. Some original results and the independent proofs of known results are presented.

6

Representing syntax by means of properties : a formal framework for descriptive approaches

Blache P.

Journal of Language Modelling

|

2016

|

Vol. 4, No. 2

183--224

EN

Linguistic description and language modelling need to be formalny sound and complete while still being supported by data. We present a linguistic framework that bridges such formal and descriptive requirements, based on the representation of syntactic information by means of local properties. This approach, called Property Grammars, provides a formal basis for the description of specific characteristics as well as entire constructions. In contrast with other formalisms, all information is represented at the same level (no property playing a more important role than another) and independently (any property being evaluable separately). As a consequence, a syntactic description, instead of a complete hierarchical structure (typically a tree), is a set of multiple relations between words. This characteristic is crucial when describing unrestricted data, including spoken language. We show in this paper how local properties can implement any kind of syntactic information and constitute a formal framework for the representation of constructions (seen as a set of interacting properties). The Property Grammars approach thus offers the possibility to integrate the description of local phenomena into a general formal framework.

7

High-level methodologies for grammar engineering. Introduction to the special issue

Duchier D., Parmentier Y.

Journal of Language Modelling

|

2015

|

Vol. 3, No. 1

5--19

EN

Grammar engineering is the task of designing and implementing linguistically motivated electronic descriptions of natural language (socalled grammars). These grammars are expressed within well-defined theoretical frameworks, and offer a fine-grained description of natural language. While grammars were first used to describe syntax, that is to say, the relations between constituents in a sentence, they often go beyond syntax and include semantic information. Grammar engineering provides precise descriptions which can be used for natural language understanding and generation, making these valuable resources for various natural language applications, including textual entailment, dialogue systems, or machine translation. The first attempts at designing large-scale resource grammars were costly because of the complexity of the task (Erbach 1990) and of the number of persons that were needed (see e.g. Doran et al. 1997). Advances in the field have led to the development of environments for semi-automatic grammar engineering, borrowing ideas from compilation (grammar engineering is compared with software development) and machine learning. This special issue reports on new trends in the field, where grammar engineering benefits from elaborate high-level methodologies and techniques, dealing with various issues (both theoretical and practical).

8

Post-structural games of architecture

Serafin A.

Czasopismo Techniczne. Architektura

|

2015

|

Y. 112, iss. 9-A

313--318

EN

The play of architectural emblems that is cited in the conference thesis often seems to lead to a revaluation of the visual realm of architecture. This text therefore deliberately manipulates the concept of post-structuralism as a general cultural trend, avoiding reference to any architectural styles. The author, however, attempts to draw up a classification of post-structural architectural games, including the interdisciplinary.

PL

Przywołana w tezach konferencyjnych architektoniczna zabawa w emblematy często zdaje się prowadzić do przewartościowania wizualnej sfery architektury. Niniejszy tekst zatem celowo operuje pojęciem postrukturalizmu, rozumianego jako szeroki front ogólnokulturowy, unikając powoływania się na jakiekolwiek style architektoniczne. Autor podejmuje natomiast próbę zarysowania systematyki gier, także interdyscyplinarnych, w jakie uwikłana jest architektura o podłożu postrukturalnym.

9

"Mówienie o sobie" w Piotra Zaremby wspomnieniach prezydenta Szczecina

Kozaryn D.

Przestrzeń i Forma

|

2014

|

nr 22/3

179--188

PL

Artykuł dotyczy wybranych sposobów „mówienia o sobie”, rozumianego jako sposoby wyrażania przez mówiącego stanów przez siebie doznawanych, głoszonych przez siebie sądów, przekonań, swej woli, w Piotra Zaremby części piątej Wspomnień prezydenta Szczecina: 1949 – rok stabilizacji. Przeprowadzone rozważania, w kontekście języka osobniczego, pozwoliły na stwierdzenie, że język pierwszego prezydenta Szczecina jest aktualizacją polskiego języka ogólnego typową dla ludzi wykształconych przed wojną, dla których język był wartością, którzy zdawali sobie sprawę z jego roli i siły jako elementu kreującego rzeczywistość. Nie ma jednak podstaw do uznania, że Piotr Zaremba był jednostkę wybitną w tym względzie.

EN

The article discusses selected methods of "talking about oneself", understood as the methods of expressing by the speaker of the experienced states, beliefs and his intentions in the fifth part of Piotr Zaremba's Memories of the Szczecin City Mayor: 1949 – the Stabilization Year. The discussion, in the context of individual language, made it possible to conclude that the language of the first Szczecin mayor is the actualization of the general Polish language typical for educated people before the war, for whom their language was a value, who realised the role and the power of language as an element creating the reality. Nevertheless, there are no grounds for considering Piotr Zaremba as an outstanding individual in this respect.

10

Constructions with Lexical Integrity

Asudeh A., Dalrymple M., Toivonen I.

Journal of Language Modelling

|

2013

|

Vol. 1, No. 1

1--54

EN

Construction Grammar holds that unpredictable form-meaning combinations are not restricted in size. In particular, there may be phrases that have particular meanings that are not predictable from the words that they contain, but which are nonetheless not purely idiosyncratic. In addressing this observation, some construction grammarians have not only weakened the word/phrase distinction, but also denied the lexicon/grammar distinction. In this paper, we consider the word/phrase and lexicon/grammar distinction in light of Lexical-Functional Grammar and its Lexical Integrity Principle. We show that it is not necessary to remove the word/phrase distinction or the lexicon/grammar distinction to capture constructional effects, although we agree that there are important generalizations involving constructions of all sizes that must be captured at both syntactic and semantic levels. We use LFG’s templates, bundles of grammatical descriptions, to factor out grammatical information in such a way that it can be invoked either by words or by construction-specific phrase structure rules. Phrase structure rules that invoke specific templates are thus the equivalent of phrasal constructions in our approach, but Lexical Integrity and the separation of word and phrase are preserved. Constructional effects are captured by systematically allowing words and phrases to contribute comparable information to LFG’s level of functional structure; this is just a generalization of LFG’s usual assumption that “morphology competes with syntax” (Bresnan, 2001).

11

Exploiting Prosody for Automatic Syntactic Phrase Boundary Detection in Speech

Szaszák G., Beke A.

Journal of Language Modelling

|

2012

|

Vol. 0, No. 1

143--172

EN

The relation between syntax and prosody is evident, even if the prosodic structure cannot be directly mapped to the syntactic one and vice versa. Syntax-to-prosody mapping is widely used in text-to-speech applications, but prosody-to-syntax mapping is mostly missing from automatic speech recognition/understanding systems. This paper presents an experiment towards filling this gap and evaluating whether a HMM-based automatic prosodic segmentation tool can be used to support the reconstruction of the syntactic structure directly from speech. Results show that up to 85% of syntactic clause boundaries and up to about 70% of embedded syntactic phrase boundaries could be identified based on the detection of phonological phrases. Recall rates do not depend further on syntactic layering, in other words, whether the phrase is multiply embedded or not. Clause boundaries can be well assigned to intonational phrase level in read speech and can be well separated from lower level syntactic phrases based on the type of the aligned phonological phrase(s). These findings can be exploited in speech understanding systems, allowing for the recovery of the skeleton of the syntactic structure, based purely on the speech signal.

12

Graded Alternating-Time Temporal Logic

Faella M., Napoli M., Parente M.

Fundamenta Informaticae

|

2011

|

Vol. 105, nr 1/2

189-210

EN

Recently, temporal logics such as μ-calculus and Computational Tree Logic, CTL, augmented with graded modalities have received attention from the scientific community, both from a theoretical side and from an applicative perspective. In both these settings, graded modalities enrich the universal and existential quantifiers with the capability to express the concept of at least k or all but k, for a non-negative integer k. Both μ-calculus and CTL naturally apply as specification languages for closed systems: in this paper, we study how graded modalities may affect specification languages for open systems. We extend the Alternating-time Temporal Logic (ATL) introduced by Alur et al., that is a derivative of CTL interpreted on game structures, rather than transition systems. We solve the model-checking problem in the concurrent and turn-based settings, proving its PTIME-completeness. We present, and compare with each other, two different semantics: the first seems suitable to off-line synthesis applicationswhile the secondmay find application in the verification of fault-tolerant controllers. We also study the case where players can only employ memoryless strategies, showing that also in this case the model-checking problem is in PTIME.

13

Interactive Systems with Registers and Voices

Stefanescu G.

Fundamenta Informaticae

|

2006

|

Vol. 73, nr 1-2

285-305

EN

We present a model and a core programming language appropriate for modeling and programming interactive computing systems. The model consists of rv-systems (interactive systems with registers and voices); it includes register machines, is space-time invariant, is compositional, may describe computations extending in both time and space, and is applicable to open, interactive systems. To achieve modularity in space the model uses voices (a voice is the time dual of a register) - they provide a high level organization of temporal data and are used to describe interaction interfaces of processes. The programming language uses novel techniques for syntax and semantics to support computation in space paradigm. We describe rv-programs and base their syntax and operational semantics on FIS-es (finite interactive systems) and their grid languages (a FIS is a kind of 2-dimensional automaton specifying both control and interaction used in rv-programs). We also present specification techniques for rv-systems, using relations between input registers and voices and their output counterparts. The paper includes simple specifications for an OO-system and for an interactive game.

14

Biosyntax : An Overview

Bel-Enguix G., Jiménez-López M. D.

Fundamenta Informaticae

|

2005

|

Vol. 64, nr 1-4

17-28

EN

In this paper we consider a new framework for linguistics based on the behavior of DNA molecules: biosyntax. This new framework includes two approaches - molecular syntax and recombination patterns - that seem to be quite suitable for explaining in a completely new way some syntactic phenomena. Molecular syntax and recombination patterns are two different formalisms with the same single idea: mechanisms at work in biology may be used in the field of linguistics and natural language processing and may provide a simpler and more efficient approach to the description of the syntax of natural languages.

15

Sign, information, and consciousness.

Combs A., Brier S.

Systems : journal of transdisciplinary systems science

|

2000

|

Vol. 5, No. 1-2

15-24

EN

This paper proposes a multilevel theory of the nature of information. It reviews three historical contexts for the discussion of information. The first is the AI context of the Shannon-Weaver mathematical theory of communication. The second is the context of syntax and semantics. And the third is the context of human consciousness. The paper seeks to unify these seemingly disparate contexts though a contemporary understanding of Charles Peirce's philosophy of semiotics.

16

Integration of structural pattern recognition methods into knowledge-based framework. Application to geographic map image analysis

Stąpor K.

Studia Informatica

|

2000

|

Vol. 21, nr 4

1-212

EN

In this thesis the generally applicable reasoning framework with the developed and properly integrated within it structural classifiers for automatic conversion of paper documents into a digital form is proposed. As the application of the proposed methods, the Polish Fundamental Land Map has been chosen. The proposed framework is based on the knowledge-based approach, and is composed of the three schemas: 1) map model, represented in the form of the developed hybrid semantic network which is a combination of the different knowledge representation formalisms: semantic network, rules, frame system and the object-oriented computation paradigm, 2) object detection methods (structural classifiers), 3) image analysis flow scheme composed of the two structures: mixed control mechanism and the non-monotonic reasoning method called complementary reasoning. The image analysis flow scheme, which is based on the proposed intelligent analysis strategy, is accomplished by having the initial recognition of seed objects followed by the progressive extraction of the remaining layers of geographic objects through the iteration of the 4-step interpretation cycle: hypotheses generation, compatibility examination, scenario selection, and scenario verification. Each of the developed structural classifiers of the different graphics/text components of map drawing is based on the appropriate object model. The simplest is the polygon detector which is based on the defined polygon models. The second detector relies on the relational model of a general 2D curve, and the recognition method consists of matching relational structures (i.e. searching for relational homomorphism), followed by distance calculation in a parameter space. The third detector is based on the error-tolerant graph matching procedure between objects represented by the attributed graphs which is a search for an optimal many-to-one graphs homomorphism. The most general detector is the fourth one, in which objects are represented by the programmed, higher-dimensional extension of a string grammar, and the recognition is performed by error-correcting, mixed (top-down/bottom-up) parsing. All detectors are translational and rotational invariant and are tolerant to many complex structural deformations, including topological ones. The proposed framework is composed of the following modules: 1) reasoning engine which is is a global coordinator of the reasoning performed by the independent object instances, 2) detectors manager, 3) detectors which are based on the different pattern recognition techniques and use the intrinsic properties of objects for model-based recognition, 4) vector graph construction module, 5) resegmentation module which realizes the refining matches by resegmentation technique to recover the missing object features, and 5) evidence pool which is a database where accumulation of object instances and hypotheses is performed during the analysis. The following 4-level structure of geographic map image analysis process is proposed in the thesis: 1) pre-processing level composed of binarization, filled region extraction and thinning, 2) vector graph construction level, i.e. line structure tracking, vector graph segmentation, small and big graph vectorizations, 3) graphics and text recognition level, 4) understanding level. The proposed, general reasoning framework, together with the developed structural classifiers respond to the all posed problems of geographic map image analysis (reasoning with incomplete information, control strategy, contextual layer separation, reliable analysis) , as well as of the structural pattern recognition (limited descriptive power of string grammars, errorneous patterns, high number of prototypes, interaction of segmentation with recognition, selection of region of interest). The proposed approach to document image analysis enables for increased flexibilty: the image analysis flow scheme is independent of the application, the map model, dependent on the application, can be used for another ones only after small adjustments. The proposed framework has been implemented as the map conversion module MA PIN, which replaced the manual input of paper maps into the Geographic Information System, converting correctly about 70% of the information in the maps and leaving the remaining 30% to the operator.

PL

W rozprawie przedstawiono zaproponowaną, ogólną strukturę wnioskującą wraz z wbudowanymi strukturalnymi klasyfikatorami dla celu automatycznej analizy (konwersji) dokumentów z postaci papierowej do odpowiedniego formatu cyfrowego. Jako zastosowanie opracowanych metod wybrano Polską Zasadniczą Mapę Kraju. Zaproponowana struktura wnioskująca bazująca na opartym na wiedzy podejściu złożona jest z trzech schematów: 1) modelu mapy w postaci hybrydowej sieci semantycznej opartej na różnych formalizmach reprezentacji wiedzy: sieci semantycznej, reguł, ram i podejścia obiektowo-zorientowanego, 2) metod detekcji obiektów (strukturalnych klasyfikatorów), 3) schematu przebiegu analizy obrazu złożonego z dwóch podschematów: mieszanego mechanizmu kontroli oraz metody wnioskowania niemonotonicznego, zwanego wnioskowaniem uzupełniającym. Schemat przebiegu analizy obrazu, oparty na zaproponowanej, inteligentnej strategii analizy składa się z wstępnego rozpoznawania obiektów zarodkowych, po którym następuje ekstrakcja pozostałych warstw obiektów geograficznych poprzez iterację cyklu interpretacji złożonego z czterech kroków: generacji hipotez, testu kompatybilności, wyboru scenariusza oraz jego weryfikacji. Każdy z zaprojektowanych strukturalnych klasyfikatorów graficznych/tekstowych komponentów mapy wykorzystuje odpowiedni model obiektu. Najprostszym jest detektor wielokątów, który oparty jest na zdefiniowanych modelach wielokątów. Drugi z detektorów oparty jest na relacyjnym modelu krzywej, zaś algorytm rozpoznawania polega na uzgadnianiu struktur relacyjnych (poszukiwaniu relacyjnego homomorfizmu), po którym następuje obliczanie odległości w przestrzeni cech. Metoda rozpoznawania w trzecim detektorze jest przybliżonym, odpornym na strukturalne zniekształcenia porównywaniem grafów atrybutywnych reprezentujących obiekty wzorca oraz nieznanego obiektu. Najbardziej uniwersalnym jest detektor czwarty, w którym obiekty są reprezentowane za pomocą rozszerzonej o specjalne operatory, programowej gramatyki łańcuchowej, zaś rozpoznawanie polega na niedokładnej, mieszanej analizie syntaktycznej, z wbudowaną korekcją błędów. Wszystkie opracowane klasyfikatory są odporne na translację oraz obroty, a także wykazują dużą tolerancję na różnego typu deformacje strukturalne obiektów. Zaproponowana struktura wnioskująca zrealizowana jest za pomocą następujących modułów: 1) mechanizmu wnioskującego, który jest globalnym koordynatorem procesów wnioskowania poszczególnych obiektów, 2) modułu zarządzania detektorami obiektów, 3) detektorów, opartych na różnych metodach strukturalnego rozpoznawania obrazów, 4) modułu konstrukcji grafu wektorowego, 5) modułu resegmentacji, w którym wykonywane jest wielokrotne powtarzanie procedury segmentacji ze zmienionymi wartościami parametrów w przypadku niemożliwości detekcji poszukiwanych obiektów przez detektory, 5) bazy faktów, w której akumulowane są wszystkie pośrednie wyniki rozpoznawania (obiektów i hipotez). W pracy zaproponowano i zrealizowano następującą, 4-poziomową strukturę analizy map geograficznych: 1) przetwarzanie wstępne, obejmujące binaryzację, ekstrakcję regionów wypełnionych oraz szkieletyzację, 2) konstrukcję grafu wektorowego (śledzenie linii, segmentacja, wektoryzacja małego i dużego grafu, 3) rozpoznawanie komponentów graficznych oraz tekstowych, 4) interpretacja. Zaproponowana struktura wnioskująca wraz z wbudowanymi strukturalnymi klasyfikatorami pozwala na rozwiązanie postawionych w rozprawie problemów dotyczących zarówno samej automatycznej analizy map geograficznych (wnioskowanie z niepełną informacją, mechanizm kontroli, kontekstowa segmentacja, rzetelna analiza), jak również problemów strukturalnego rozpoznawania obrazów (ograniczona moc opisowa gramatyk łańcuchowych, generacja błędnych struktur, duża liczba strukturalnych prototypów, interakcja rozpoznawania i segmentacji, wybór właściwego obszaru do analizy). Zaproponowane podejście do automatycznej analizy dokumentów jest w dużym stopniu uniwersalne: schemat przebiegu analizy obrazu jest niezależny od rodzaju aplikacji, natomiast model mapy, zależny od typu dokumentu (syntaktyka), może być w prosty sposób zaadaptowany dla celów innej aplikacji. Przedstawiona struktura wnioskująca wraz z klasyfikatorami została zaimplementowana w postaci modułu półautomatycznej konwersji map, co pozwoliło na jej 70% automatyzację.

17

Image labelling by random graph parsing for syntactic scene description

Skomorowski M.

Foundations of Computing and Decision Sciences

|

1998

|

Vol. 23, No. 3

161-178

EN

A new approach to scene labelling is proposed. The proposed approach involves parsing for graph grammars. To take into account all variations of an ambiguous (distorted) scene under study, a probabilistic description of the scene is needed. Random graphs are proposed here for such a description. An efficient, O(n2), parsing algorithm for random graphs is proposed here as a tool for scene labelling. An example is provided.