Uncertain sequences are compact representations of sets of similar strings. They highlight common segments by collapsing them, and explicitly represent varying segments by listing all possible options. A generalized degenerate string (GD string) is a type of uncertain sequence. Formally, a GD string Ŝ is a sequence of n sets of strings of total size N, where the ith set contains strings of the same length ki but this length can vary between different sets. We denote by W the sum of these lengths k0, k1, . . . , k n-1. Our main result is an 𝒪(N + M)-time algorithm for deciding whether two GD strings of total sizes N and M, respectively, over an integer alphabet, have a non-empty intersection. This result is based on a combinatorial result of independent interest: although the intersection of two GD strings can be exponential in the total size of the two strings, it can be represented in linear space. We then apply our string comparison tool to devise a simple algorithm for computing all palindromes in Ŝ in 𝒪(min{W , n 2}N )-time. We complement this upper bound by showing a similar conditional lower bound for computing maximal palindromes in Ŝ. We also show that a result, which is essentially the same as our string comparison linear-time algorithm, can be obtained by employing an automata-based approach.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this work, we consider a special type of uncertain sequence called weighted string. In a weighted string every position contains a subset of the alphabet and every letter of the alphabet is associated with a probability of occurrence such that the sum of probabilities at each position equals 1. Usually a cumulative weight threshold 1/z is specified, and one considers only strings that match the weighted string with probability at least 1/z. We provide an O(nz)-time and O(nz)-space off-line algorithm, where n is the length of the weighted string and 1/z is the given threshold, to compute a smallest maximal palindromic factorization of a weighted string. This factorization has applications in hairpin structure prediction in a set of closely-related DNA or RNA sequences. Along the way, we provide an O(nz)-time and O(nz)-space off-line algorithm to compute maximal palindromes in weighted strings. Finally, we provide an experiment of our proposed algorithm.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
We prove that a random word of length n over a k-ary fixed alphabet contains, on expectation,Θ(√n) distinct palindromic factors. We study this number of factors, E(n, k), in detail, showing that the limit limn→∞ E(n, k)=√n does not exist for any κ ≥ 2, lim infn→∞ E(n; k)=√n =Θ(1), and lim supn→∞ E(n; k)=√n = Θ(√k). Such a complicated behaviour stems from the asymmetry between the palindromes of even and odd length. We show that a similar, but much simpler, result on the expected number of squares in random words holds. We also provide some experimental data on the number of palindromic factors in random words.
4
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
We study the question of what can be said about a word based on the numbers of occurrences of certain factors in it. We do this by defining a family of equivalence relations that generalize the so called k-abelian equivalence. The characterizations and answers we obtain are linear algebraic. We also use these equivalence relations to help us in solving some problems related to repetitions and palindromes, and to point out that some previous results about Sturmian words and k-abelian equivalence hold in a more general form.
5
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The longest common subsequence (LCS) problem is a classic and well-studied problem in computer science. Palindrome is a word which reads the same forward as it does backward. The longest common palindromic subsequence (LCPS) problem is a variant of the classic LCS problem which finds a longest common subsequence between two given strings such that the computed subsequence is also a palindrome. In this paper, we study the LCPS problem and give two novel algorithms to solve it. To the best of our knowledge, this is the first attempt to study and solve this problem.
6
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
In this paper we investigate the injectivity of the Parikh matrix mapping. This research is done mainly on the binary alphabet. We identify a family of binary words, refered to as ``palindromic amiable'', such that two such words are palindromic amiable if and only if they have the same image by the Parikh matrix mapping. Some other related problems are discussed, too.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.