We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyper-parameter the resulting morphism is injective.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this paper we investigate the injectivity of the Parikh matrix mapping. This research is done mainly on the binary alphabet. We identify a family of binary words, refered to as ``palindromic amiable'', such that two such words are palindromic amiable if and only if they have the same image by the Parikh matrix mapping. Some other related problems are discussed, too.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
We deal with the notion of M-unambiguity [5] in connection with the Parikh matrix mapping introduced by Mateescu and others in [7]. M-unambiguity is studied both in terms of words and matrices and several sufficient criteria for M-unambiguity are provided in both cases, nontrivially generalizing the criteria based on the g-property introduced by Salomaa in [15]. Also, the notion of M-unambiguity with respect to a word is defined in connection with the extended Parikh matrix morphism [16] and some of the M-unambiguity criteria are lifted from the classical setting to the extended one.
4
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
We investigate the number of (scattered) subword occurrences and Parikh matrices, especially the case where the matrix determines the word uniquely. A condition introduced in this paper, called g-property, turns out to be a powerful tool for such unambiguous matrices. Interconnections with the general theory of subword histories are also pointed out.
5
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
We investigate binary words and languages having a balanced structure of (scattered) subwords. We introduce a "difference function" D for binary words. For D=0, the resulting language is properly context-sensitive. Parikh matrices constitute a useful technical tool in the study, we investigate also the independence of their entries. The investigation is extended to concern w-words and periodicity. For the Fibonacci word, the D-values are in many ways connected with the Fibonacci numbers.
6
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The paper investigates inference based on quantities |w|u, the number of occurrences of a word u as a scattered subword of w. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers |w|u to w, as well as from certain entries of a Parikh matrix to other entries.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.