PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!
  • Sesja wygasła!
Tytuł artykułu

Some Investigations on Similarity Measures Based on Absent Words

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset 𝒟(x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that 𝒟(x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common structure, but they are not able to detect the difference on the number of occurrences of such a structure in the two sequences. In order to take into account such a multiplicity, we introduce the notion of multifactor, and define a new measure that uses both absent words and multifactors. Surprisingly, we prove that this similarity measure coincides with a distance on sequences introduced by Ehrenfeucht and Haussler in [2], in the context of block-moves strategies. In this way, our result creates a non trivial bridge between similarity measures based on absent words and those based on the block-moves approach.
Wydawca
Rocznik
Strony
97--112
Opis fizyczny
Bibliogr. 14 poz., rys.
Twórcy
  • Dipartimento di Matematica e Informatica, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
  • Dipartimento di Matematica e Informatica, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
  • Dipartimento di Matematica e Informatica, Universitá di Palermo, Via Archirafi 34, 90123 Palermo, Italy
Bibliografia
  • [1] Chairungsee S, Crochemore M. Using minimal absent words to build phylogeny. Theor. Comput. Sci., 2012. 450:109-116. URL https://doi.org/10.1016/j.tcs.2012.04.031.
  • [2] Ehrenfeucht A, Haussler D. A new distance metric on strings computable in linear time. Discrete Applied Mathematics, 1988. 20(3):191-203. doi:10.1016/0166-218X(88)90076-5.
  • [3] Crochemore M, Mignosi F, Restivo A. Automata and Forbidden Words. Inf. Process. Lett., 1998. 67(3): 111-117. URL https://doi.org/10.1016/S0020-0190(98)00104-5.
  • [4] Charalampopoulos P, Crochemore M, Fici G, Mercas R, Pissis SP. Alignment-free sequence comparison using absent words. Inf. Comput., 2018. 262(Part):57-68. URL https://doi.org/10.1016/j.ic.2018.06.002.
  • [5] Rahman MS, Alatabbi A, Crochemore M, Rahman MS. Absent words and the (dis)similarity analysis of DNA sequences: An experimental study. BMC Research Notes., 2016. 9:186 450:1-8. doi:10.1186/s13104-016-1972-z.
  • [6] Kaplan H, Shafrir N. The greedy algorithm for edit distance with moves. Information Processing Letters, 2006. 97(1):23-27. URL https://doi.org/10.1016/j.ipl.2005.08.010.
  • [7] Shapira D, Storer JA. Edit Distance with Move Operations. Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science,, 2002 pp. 85-98. 2373. doi:10.1007/3-540-45452-7_9.
  • [8] Shapira D, Storer JA. Edit distance with move operations. Journal of Discrete Algorithms, 2007. 5(2):380-392. 2004 Symposium on String Processing and Information Retrieval. URL https://doi.org/10.1016/j.jda.2005.01.010.
  • [9] Tichy WF. The String-to-String Correction Problem with Block Moves. ACM Trans. Comput. Syst., 1984. 2(4):309-321.
  • [10] Berstel J, Perrin D, Reutenauer C. Codes and Automata (Encyclopedia of Mathematics and Its Applications). Cambridge University Press, New York, NY, USA, 1st edition, 2009. ISBN-052188831X, 9780521888318.
  • [11] Lothaire M. Combinatorics on Words. Addison-Wesley, 1983. ISBN-978-0-12-198820-3.
  • [12] Day JD, Fleischmann P, Manea F, Nowotka D. k-Spectra of c-Balanced Words. CoRR, 2019. abs/1904.09125. URL http://arxiv.org/abs/1904.09125.
  • [13] Apostolico A, Preparata FP. Data Structures and Algorithms for the String Statistics Problem. Algorithmica, 1996. 15(5):481-494. doi:10.1007/BF01955046.
  • [14] Hooshmand S, Abedin P, Külekci MO, Thankachan SV. Non-Overlapping Indexing - Cache Obliviously. In: Annual Symposium on Combinatorial Pattern Matching, CPM 2018, July 2-4, 2018 - Qingdao, China. 2018 pp. 8:1-8:9. doi:10.4230/LIPIcs.CPM.2018.8.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5cb7b081-71cc-4daf-a1f7-575f8ac01ad4
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.