This article describes a new and efficient algorithm for parsing (called tunnel parsing) that parses from left to right on the basis of context-free grammar without left recursion nor rules that recognize empty words. The algorithm is mostly applicable for domain-specific languages. In the article, particular attention is paid to the parsing of grammar element repetitions. As a result of the parsing, a statically typed concrete syntax tree is built from top to bottom, that accurately reflects the grammar. The parsing is not done through a recursion, but through an iteration. The tunnel parsing algorithm uses the grammars directly without a prior refactoring and is with a linear time complexity for deterministic context-free grammars.
A recently proposed balanced-bracket encoding (Yli-Jyrä and Gómez-Rodríguez 2017) has given us a way to embed all noncrossing dependency graphs into the string space and to formulate their exact arcfactored inference problem (Kuhlmann and Johnsson 2015) as the best string problem in a dynamically constructed and weighted unambiguous context-free grammar. The current work improves the encoding and makes it shallower by omitting redundant brackets from it. The streamlined encoding gives rise to a bounded-depth subset approximation that is represented by a small finite-state automaton. When bounded to 7 levels of balanced brackets, the automaton has 762 states and represents a strict superset of more than 99.9999% of the noncrossing trees available in Universal Dependencies 2.4 (Nivre et al. 2019). In addition, it strictly contains all 15-vertex noncrossing digraphs. When bounded to 4 levels and 90 states, the automaton still captures 99.2% of all noncrossing trees in the reference dataset. The approach is flexible and extensible towards unrestricted graphs, and it suggests tight finite-state bounds for dependency parsing, and for the main existing parsing methods.
Further results of research into graph grammar parsing for syntactic pattern recognition (Pattern Recognit. 21:623-629, 1988; 23:765-774, 1990; 24:1223-1224, 1991; 26:1-16, 1993; 43:249-2264, 2010; Comput. Vision Graph. Image Process. 47:1-21, 1989; Fundam. Inform. 80:379-413, 2007; Theoret. Comp. Sci. 201:189-231, 1998) are presented in the paper. The notion of interpreted graphs based on Tarski's model theory is introduced. The bottom-up parsing algorithm for ETPR(k) graph grammars is defined.
W artykule przedstawiono rezultaty badań metod, które pomogą tworzyć bazę danych o użytkownikach z wykorzystaniem serwisów internetowych jako podstawowego źródła informacji. Celem badania było pozyskanie danych o użytkownikach z sieci społecznościowej, sprawdzenie możliwości parsera oraz analiza efektywności wykorzystania utworzonej bazy danych. W badaniach zostały wykorzystane metody analizy tekstowej: parsing i scraping. Wyniki zostały przedstawione w postaci wykresów i poddane krytycznej analizie porównawczej.
EN
The article presents the results of a study that will help for user to create a end-user parameters database using other web-sites as the primary source. The main goal of the work is making analysis to obtain information about the user of social networks. Next goal is analysis of the gathering data and the possibility of their use. The experiment was done using two methods of text analysis: parsing and scraping. The results are presented graphically and critical compared to each other.
Further results of research into parsable graph grammars used for syntactic pattern recognition (Pattern Recognition: 21, 623-629 (1988); 23, 765-774 (1990); 24, 12-23 (1991); 26, 1-16 (1993); 43, 2249-2264 (2010), Comput. Vision Graph. Image Process. 47, 1-21 (1989), Computer-Aided Design 27, 403-433 (1995), Theoret. Comp. Sci. 201, 189-231 (1998), Pattern Analysis Applications bf 17, 465-480 (2014)) are presented in the paper. The generative power of reduction-based parsable ETPR(k) graph grammars is investigated. The analogy between the triad of CF - LL(k) - LR(k) string languages and the triad of NLC - ETPL(k) - ETPR(k) graph languages is discussed.
6
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. Its properties are useful in many applications, but it is not well understood as a language definition tool. In its appearance, PEG is almost identical to a grammar in the Extended Backus-Naur Form (EBNF), and one may expect it to define the same language. But, due to the limited backtracking, PEG may reject some strings defined by EBNF, which gives an impression of PEG being unpredictable. We note that for some grammars, the limited backtracking is “efficient”, in the sense that it exhausts all possibilities. A PEG with efficient backtracking should therefore be easy to understand. There is no general algorithm to check if the grammar has efficient backtracking, but it can be often checked by inspection. The paper outlines an interactive tool to facilitate such inspection.
Minimalist grammars have been used recently in a series of papers to explain well-known contrasts in human sentence processing in terms of subtle structural differences. These proposals combine a top-down parser with complexity metrics that relate parsing difficulty to memory usage. So far, though, there has been no large-scale exploration of the space of viable metrics. Building on this earlier work, we compare the ability of 1,600 metrics to derive several processing effects observed with relative clauses, many of which have been proven difficult to unify. We show that among those 1,600 candidates, a few metrics (and only a few) can provide a unified account of all these contrasts. This is a welcome result for two reasons: First, it provides a novel account of extensively studied psycholinguistic data. Second, it significantly limits the number of viable metrics that may be applied to other phenomena, thus reducing theoretical indeterminacy.
We prove a Chomsky-Schützenberger representation theorem for multiple context-free languages weighted over complete commutative strong bimonoids. Using this representation we devise a parsing algorithm for a restricted form of those devices.
9
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this article the algorithm for automated detection of non-relevant or wrong information on websites is introduced. The algorithm extracts the semantic information from the webpage using third party software and compares the semantic information with the reliable resources. Reliable information is identified by the means of majority voting or extracted from reliable databases.
A new type of graph is introduced, the grammar graph. The possibility of assigning labels to each node in such a graph extends it to the grammar net. The grammar net should be considered as a new graphical tool that helps in an analysis of whether a particular sentence belongs to a given context-sensitive grammar. Another concept, the derivation net, closely related to the grammar graph and of a similar structure, will be used to show an algorithm that is able to decide that some sentences do not belong to a language generated by a context sensitive grammar, while leaving others as a candidate members of it.
PL
W artykule wprowadzony został nowy rodzaj grafu – graf gramatyczny. Możliwość przypisywania etykiet do węzłów daje rozszerzenie do tzw. sieci gramatycznej. Sieć gramatyczną należy traktować jako nowe narzędzie graficzne w analizie przynależności zdań do danego języka kontekstowego. Inna koncepcja, sieć wywodu, ściśle związana z grafem gramatycznym i o podobnej strukturze, została wykorzystana do pokazania algorytmu, który potrafi wstępnie wyselekcjonować niektóre zdania nienależące do danego języka generowanego przez gramatykę kontekstową, pozostawiając inne jako potencjalnie w nim zawarte.
11
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The parsing of a symbolic sequence into a set of short substrings called words invented by the author is used for a new definition of the distance between sequences. No sequence alignment is necessary. The most frequent among spectra of multiprotein sequences are selected and considered as a reference spectrum of the sequences. The distance between the reference spectrum and protein sequences is considered as the estimation of the evolutionary distance of the protein. As an application, amino acid sequences of the several mitochondria-encoded proteins of mammal species are ordered according to their evolutionary distance. Statistical distribution of the distances between exhibits some structures related to the evolutionary rate in the past.
12
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. It has been recently noticed that in the situation when the parser is to explore several alternatives one after another, no further alternatives need to be explored after the parser reached certain ”cut point”. This fact can be used to save both processing time and storage. The subject of the paper is identification of cut points, which can also help in producing better diagnostics.
The paper focuses on the adjustment of NLP tools for Polish; e.g., morphological analyzers and parsers, to user-generated content (UGC). The authors discuss two rule-based techniques applied to improve their efficiency: pre-processing (text normalization) and parser adaptation (modified segmentation and parsing rules). A new solution to handle OOVs based on inflectional translation is also offered.
This paper presents the development of a grammar and a syntactic parser for the Vietnamese language. We first discuss the construction of a lexicalized tree-adjoining grammar using an automatic extraction approach. We then present the construction and evaluation of a deep syntactic parser based on the extracted grammar. This is a complete system that produces syntactic structures for Vietnamese sentences. A dependency annotation scheme for Vietnamese and an algorithm for extracting dependency structures from derivation trees are also proposed. This is the first Vietnamese parsing system capable of producing both constituency and dependency analyses. It offers encouraging performance: accuracy of 69.33% and 73.21% for constituency and dependency analysis, respectively.
This paper is focused on the process of computing First Sets. The First Sets are used to build structures which control a syntax analyser (also known as parser). Three methods of creating First Sets were compared in terms of execution time. The first method is known sequential algorithm and the author’s own methods are concurrent computing sets for each non-terminal symbol (called the CEN method) and concurrent computing sets for each production (called the CEP method). These methods have been tested on personal computer. Three programming languages (including the C language) were used in the research. The results and the analysis of calculations allow the author to hypothesise that the problem of computing First Sets is hard to concurrence.
16
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. The parser has many useful properties. Converting PEG to an executable parser is a rather straightforward task. Unfortunately, PEG is not well understood as a language definition tool. It is thus of a practical interest to construct PEGs for languages specified in some familiar way, such as Backus-Naur Form (BNF). The problem was attacked by Medeiros in an elegant way by noticing that both PEG and BNF can be formally defined in a very similar way. Some of his results were extended in a previous paper by this author. We continue here with further extensions.
This paper discusses recently introduced kind of linguistically motivated restriction placed on tree-controlled grammars-context-free grammars with some root-to-leaf paths in their derivation trees restricted by a control language. We deal with restrictions placed on n greater-than or equal to 1 paths controlled by a deterministic context-free language, and we recall several basic properties of such a rewriting system. Then, we study the possibilities of corresponding parsing methods working in polynomial time and demonstrate that some non-context-free languages can be generated by this regulated rewriting model. Furthermore, we illustrate the syntax analysis of LL grammars with controlled paths. Finally, we briefly discuss how to base parsing methods on bottom-up syntax-analysis.
In this paper we present a parsing framework for extensions of Tree Adjoining Grammar (TAG) called TuLiPA (T¨ubingen Linguistic Parsing Architecture). In particular, besides TAG, the parser can process Tree-Tuple MCTAG with Shared Nodes (TT-MCTAG), a TAGextension which has been proposed to deal with scrambling in free word order languages such as German. The central strategy of the parser is such that the incoming TT-MCTAG (or TAG) is transformed into an equivalent Range Concatenation Grammar (RCG) which, in turn, is then used for parsing. The RCG parser is an incremental Earley-style chart parser. In addition to the syntactic anlysis, TuLiPA computes also an underspecified semantic analysis for grammars that are equipped with semantic representations.
19
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
We extend the result from [8] by giving also a concrete polynomial parsing algorithm for a class of languages generated by a variant of contextual grammars, namely local internal contextual grammars with context-free choice.
20
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The first objective of this paper is to present a simple grammatical formalism named treegenerating Binary Grammar. The formalism is weakly equivqlent to CFGs, yet it is capable of generating various kinds of syntactic trees, including dependency trees. Its strong equivalence to some other grammatical formalisms is discussed. The second objective is to show how some free word order phenomena in Polish can be captured within the proposed formalism.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.