PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

The expected sum of edge lengthsin planar linearizations of trees

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Dependency trees have proven to be a very successful model to rep-resent the syntactic structure of sentences of human languages. Inthese structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short hasbeen demonstrated using random baselines for the sum of the lengthsof the edges or their variants. A ubiquitous baseline is the expectedsum in projective orderings (wherein edges do not cross and the rootword of the sentence is not covered by any edge), that can be com-puted in timeO(n). Here we focus on a weaker formal constraint,namely planarity. In the theoretical domain, we present a characteri-zation of planarity that, given a sentence, yields either the number ofplanar permutations or an efficient algorithm to generate uniformlyrandom planar permutations of the words. We also show the relation-ship between the expected sum in planar arrangements and the ex-pected sum in projective arrangements. In the domain of applications,we derive aO(n)-time algorithm to calculate the expected value ofthe sum of edge lengths. We also apply this research to a parallel cor-pus and find that the gap between actual dependency distance and therandombaselinereducesasthestrengthoftheformalconstraintonde-pendency structures increases, suggesting that formal constraints ab-sorbpartofthedependencydistanceminimizationeffect.Ourresearchpaves the way for replicating past research on dependency distanceminimization using random planar linearizations as random baseline.
Rocznik
Strony
1--42
Opis fizyczny
Bibliogr. 38 poz., rys., tab.
Twórcy
  • Quantitative, Mathematical and Computational Linguistics ResearchGroup. Departament de Ciències de la Computació, Universitat Politècnicade Catalunya (UPC), Barcelona,Catalonia, Spain
  • Quantitative, Mathematical and Computational Linguistics Research Group. Departament de Ciències de la Computació, Universitat Politècnicade Catalunya (UPC), Barcelona,Catalonia, Spain
Bibliografia
  • 1. Lluís ALEMANY-PUIG, Juan Luis ESTEBAN, and Ramon FERRER-I-CANCHO(2021), The Linear Arrangement Library. A new tool for research on syntacticdependency structures, inProceedings of the Second Workshop on QuantitativeSyntax (Quasy, SyntaxFest 2021), pp. 1-16, Association for ComputationalLinguistics, Sofia, Bulgaria,https://aclanthology.org/2021.quasy-1.1.
  • 2. Lluís ALEMANY-PUIG, Juan Luis ESTEBAN, and Ramon FERRER-I-CANCHO(2022), Minimum projective linearizations of trees in linear time,InformationProcessing Letters, 174:106204, doi:10.1016/j.ipl.2021.106204.
  • 3. Lluís ALEMANY-PUIG and Ramon FERRER-I-CANCHO (2022), Linear-timecalculation of the expected sum of edge lengths in projective linearizations oftrees,Computational Linguistics, 48(3):491-516, doi:10.1162/coli_a_00442.
  • 4. Frank BERNHART and Paul C. KAINEN (1979), The book thickness of a graph,Journal of Combinatorial Theory, Series B, 27(3):320-331,doi:10.1016/0095-8956(79)90021-2.
  • 5. Fan R. K. CHUNG (1984), On optimal linear arrangements of trees,Computers &Mathematics with Applications, 10(1):43-60,doi:10.1016/0898-1221(84)90085-3.
  • 6. Thomas H. CORMEN, Charles E. LEISERSON, Ronald L. RIVEST, and CliffordSTEIN (2001),Introduction to algorithms, The MIT Press, Cambridge, MA, USA,2nd edition.
  • 7. Ramon FERRER-I-CANCHO (2004), Euclidean distance between syntacticallylinked words,Physical Review E, 70:056135, doi:10.1103/PhysRevE.70.056135.
  • 8. Ramon FERRER-I-CANCHO (2006), Why do syntactic links not cross?,Europhysics Letters (EPL), 76(6):1228-1235, doi:10.1209/epl/i2006-10406-0.
  • 9. Ramon FERRER-I-CANCHO (2019), The sum of edge lengths in random lineararrangements,Journal of Statistical Mechanics, 2019(5):053401,doi:10.1088/1742-5468/ab11e2.
  • 10. Ramon FERRER-I-CANCHO and Carlos GÓMEZ-RODRÍGUEZ (2021), Antidependency distance minimization in short sequences. a graph theoreticapproach,Journal of Quantitative Linguistics, 28(1):50-76,doi:10.1080/09296174.2019.1645547.
  • 11. Ramon FERRER-I-CANCHO, Carlos GÓMEZ-RODRÍGUEZ, and Juan LuisESTEBAN (2018), Are crossing dependencies really scarce?,Physica A: StatisticalMechanics and its Applications, 493:311-329, doi:10.1016/j.physa.2017.10.048.
  • 12. Ramon FERRER-I-CANCHO, Carlos GÓMEZ-RODRÍGUEZ, Juan Luis ESTEBAN,and Lluís ALEMANY-PUIG (2022), Optimality of syntactic dependencydistances,Physical Review E, 105:014308, doi:10.1103/PhysRevE.105.014308.
  • 13. Ramon FERRER-I-CANCHO and Haitao LIU (2014), The risks of mixingdependency lengths from sequences of different length,Glottotheory,5:143-155, doi:10.1515/glot-2014-0014.
  • 14. Richard FUTRELL, Kyle MAHOWALD, and Edward GIBSON (2015), Large-scaleevidence of dependency length minimization in 37 languages,Proceedings of theNational Academy of Sciences, 112(33):10336-10341,doi:10.1073/pnas.1502134112.
  • 15. Kim GERDES, Bruno GUILLAUME, Sylvain KAHANE, and Guy PERRIER (2018),SUD or surface-syntactic universal dependencies: an annotation schemenear-isomorphic to UD, inProceedings of the Second Workshop on UniversalDependencies (UDW 2018), pp. 66-74, Association for ComputationalLinguistics, Brussels, Belgium, doi:10.18653/v1/W18-6008.
  • 16. Daniel GILDEA and David TEMPERLEY (2007), Optimizing grammars forminimum dependency length, inProceedings of the 45th Annual Meeting of theAssociation of Computational Linguistics, pp. 184-191, Association forComputational Linguistics, Prague, Czech Republic,https://www.aclweb.org/anthology/P07-1024.
  • 17. David GILDEA and David TEMPERLEY (2010), Do grammars minimizedependency length?,Cognitive Science, 34(2):286-310,doi:10.1111/j.1551-6709.2009.01073.x.Carlos GÓMEZ-RODRÍGUEZ (2016), Restricted non-projectivity: Coverage vs.efficiency,Computational Linguistics, 42(4):809-817,doi:10.1162/COLI_a_00267.
  • 18. Carlos GÓMEZ-RODRÍGUEZ, Morten H. CHRISTIANSEN, and RamonFERRER-I-CANCHO (2022), Memory limitations are hidden in grammar,Glottometrics, 52:39-64, doi:10.53482/2022_52_397.
  • 19. Carlos GÓMEZ-RODRÍGUEZ and Ramon FERRER-I-CANCHO (2017), Scarcity ofcrossing dependencies: a direct outcome of a specific constraint?,Physics ReviewE, 96:062304, doi:10.1103/PhysRevE.96.062304.
  • 20. Carlos GÓMEZ-RODRÍGUEZ and Joakim G (2010), A transition-based parser for2-planar dependency structures, inProceedings of the 48th Annual Meeting of theAssociation for Computational Linguistics, pp. 1492-1501, Association forComputational Linguistics, Uppsala, Sweden,https://aclanthology.org/P10-1151.
  • 21. Thomas GROẞ and Timothy OSBORNE (2009), Toward a practical dependencygrammar theory of discontinuities,SKY Journal of Linguistics, 22:43-90,http://www.linguistics.fi/julkaisut/sky2009.shtml.David S. GUNDERSON (2014),Handbook of Mathematical Induction: Theory andApplications, Discrete Mathematics and Its Applications, CRC Press, ISBN9781420093643,https://www.routledgehandbooks.com/doi/10.1201/9781420093650.
  • 22. Jiří HAVELKA (2007), Beyond projectivity: multilingual evaluation ofconstraints and measures on non-projective structures, inProceedings of the 45thAnnual Meeting of the Association of Computational Linguistics, pp. 608-615,Association for Computational Linguistics, Prague, Czech Republic,https://aclanthology.org/P07-1077.
  • 23. Robert A. HOCHBERG and Matthias F. STALLMANN (2003), Optimal one-pagetree embeddings in linear time,Information Processing Letters, 87(2):59-66,doi:10.1016/S0020-0190(03)00261-8.
  • 24. Richard HUDSON (1995), Measuring syntactic difficulty,Unpublished paper,https://dickhudson.com/wp-content/uploads/2013/07/Difficulty.pdf.
  • 25. Alex KRAMER (2021), Dependency lengths in speech and writing: across-linguistic comparison via YouDePP, a pipeline for scraping and parsingYouTube captions, inProceedings of the Society for Computation in Linguistics,volume 4, pp. 359-365, doi:10.7275/pz9g-d780.
  • 26. Marco KUHLMANN and Joakim NIVRE (2006), Mildly non-projectivedependency structures, inProceedings of the COLING/ACL 2006 Main ConferencePoster Sessions, COLING-ACL ’06, pp. 507-514, doi:10.3115/1273073.1273139.
  • 27. Haitao LIU (2008), Dependency distance as a metric of languagecomprehension difficulty,Journal of Cognitive Science, 9(2):159-191,doi:10.17791/jcs.2008.9.2.159.Haitao LIU, Chunshan XU, and Junying LIANG (2017), Dependency distance: anew perspective on syntactic patterns in natural languages,Physics of LifeReviews, 21:171-193, doi:10.1016/j.plrev.2017.03.002.
  • 28. Michael MITZENMACHER and Eli UPFAL (2017),Probability and computing.Randomization and probabilistic techniques in algorithms and data analysis,Cambridge University Press, ISBN 978-1-107-15488-9.
  • 29. Glyn MORRILL (2000), Incremental processing and acceptability,ComputationalLinguistics, 25(3):319-338, doi:10.1162/089120100561728.
  • 30. Joakim NIVRE (2006), Constraints on non-projective dependency parsing, inDiana MCCARTHY and Shuly WINTNER, editors,11th Conference of theEuropean Chapter of the Association for Computational Linguistics, pp. 73-80,Association for Computational Linguistics, Trento, Italy,https://aclanthology.org/E06-1010.
  • 31. Joakim NIVRE (2009), Non-projective dependency parsing in expected lineartime, inProceedings of the Joint Conference of the 47th Annual Meeting of the ACLand the 4th International Joint Conference on Natural Language Processing of theAFNLP, ACL ’09, pp. 351-359, Association for Computational Linguistics,Stroudsburg, PA, USA,https://aclanthology.org/P09-1040.
  • 32. Y. Albert PARK and Roger P. LEVY (2009), Minimal-length linearizations formildly context-sensitive dependency trees, inProceedings of Human LanguageTechnologies: The 2009 Annual Conference of the North American Chapter of theAssociation for Computational Linguistics, pp. 335-343, Association forComputational Linguistics, Stroudsburg, PA, USA,https://aclanthology.org/N09-1038/.
  • 33. Yossi SHILOACH (1979), A minimum linear arrangement algorithm forundirected trees,SIAM Journal on Computing, 8(1):15-32,doi:10.1137/0208002.
  • 34. Daniel SLEATOR and Davy TEMPERLEY (1993), Parsing English with a linkgrammar, inProceedings of the Third International Workshop on ParsingTechnologies (IWPT’93), pp. 277-292, ACL/SIGPARSE,https://dblp.uni-trier.de/db/journals/corr/corr9508.html#abs-cmp-lg-9508004.
  • 35. David TEMPERLEY (2008), Dependency-length minimization in natural andartificial languages,Journal of Quantitative Linguistics, 15(3):256-282,doi:10.1080/09296170802159512.
  • 36. David TEMPERLEY and Daniel GILDEA (2018), Minimizing syntacticdependency lengths: Typological/cognitive universal?,Annual Review ofLinguistics, 4(1):67-80, doi:10.1146/annurev-linguistics-011817-045617.
  • 37. Himanshu YADAV, Samar HUSAIN, and Richard FUTRELL (2022), Assessingcorpus evidence for formal and psycholinguistic constraints on nonprojectivity,Computational Linguistics, pp. 1-27, doi:10.1162/coli_a_00437.
  • 38. Daniel ZEMAN, Joakim NIVRE, Mitchell ABRAMS, Elia ACKERMANN, Noëmi AEPLI, Željko AGIĆ, Lars AHRENBERG, Chika Kennedy AJEDE, Gabrielė ALEKSANDRAVIČIŪTĖ, Lene ANTONSEN, Katya APLONOVA, Angelina AQUINO,Maria Jesús ARANZABE, Gashaw ARUTIE, Masayuki ASAHARA, Luma ATEYAH, Furkan ATMACA, Mohammed ATTIA, Aitziber ATUTXA, Liesbeth AUGUSTINUS,Elena BADMAEVA, Miguel BALLESTEROS, Esha BANERJEE, Sebastian BANK,Verginica BARBU MITITELU, Victoria BASMOV, Colin BATCHELOR, JohnBAUER, Kepa BENGOETXEA, Yevgeni BERZAK, Irshad Ahmad BHAT, Riyaz Ahmad BHAT, Erica BIAGETTI, Eckhard BICK, Agnė BIELINSKIENĖ, Rogier BLOKLAND, Victoria BOBICEV, Loïc BOIZOU, Emanuel BORGES VÖLKER, Carl BÖRSTELL, Cristina BOSCO, Gosse BOUMA, Sam BOWMAN, Adriane BOYD, Kristina BROKAITĖ, Aljoscha BURCHARDT, Marie CANDITO, Bernard CARON, Gauthier CARON, Tatiana CAVALCANTI, Gülşen CEBIROĞLU ERYIĞIT, Flavio Massimiliano CECCHINI, Giuseppe G. A. CELANO, Slavomír ČÉPLÖ,Savas CETIN, Fabricio CHALUB, Ethan CHI, Jinho CHOI, Yongseok CHO, Jayeol CHUN, Alessandra T. CIGNARELLA, Silvie CINKOVÁ, Aurélie COLLOMB, Çağrı ÇÖLTEKIN, Miriam CONNOR, Marine COURTIN, Elizabeth DAVIDSON, Marie-Catherine DE MARNEFFE, Valeria DE PAIVA, Elvis DE SOUZA, Arantza DIAZ DE ILARRAZA, Carly DICKERSON, Bamba DIONE, Peter DIRIX, Kaja DOBROVOLJC, Timothy DOZAT, Kira DROGANOVA, Puneet DWIVEDI, Hanne ECKHOFF, Marhaba ELI, Ali ELKAHKY, Binyam EPHREM, Olga ERINA, Tomaž ERJAVEC, Aline ETIENNE, Wograine EVELYN, Richárd FARKAS, Hector FERNANDEZ ALCALDE, Jennifer FOSTER, Cláudia FREITAS, Kazunori FUJITA, Katarína GAJDOŠOVÁ, Daniel GALBRAITH, Marcos GARCIA, MoaGÄRDENFORS, Sebastian GARZA, Kim GERDES, Filip GINTER, Iakes GOENAGA,Koldo GOJENOLA, Memduh GÖKıRMAK, Yoav GOLDBERG, XavierGÓMEZ GUINOVART, Berta GONZÁLEZ SAAVEDRA, Bernadeta GRICIŪTĖ,Matias GRIONI, Loïc GROBOL, Normunds GRŪZĪTIS, Bruno GUILLAUME, CélineGUILLOT-BARBANCE, Tunga GÜNGÖR, Nizar HABASH, Jan HAJIČ, JanHAJIČ JR., Mika HÄMÄLÄINEN, Linh HÀ MỸ, Na-Rae HAN, Kim HARRIS, DagHAUG, Johannes HEINECKE, Oliver HELLWIG, Felix HENNIG, Barbora HLADKÁ,Jaroslava HLAVÁČOVÁ, Florinel HOCIUNG, Petter HOHLE, Jena HWANG,Takumi IKEDA, Radu ION, Elena IRIMIA, Ọlájídé ISHOLA, Tomáš JELÍNEK,Anders JOHANNSEN, Hildur JÓNSDÓTTIR, Fredrik JØRGENSEN, MarkusJUUTINEN, Hüner KAŞıKARA, Andre KAASEN, Nadezhda KABAEVA, SylvainKAHANE, Hiroshi KANAYAMA, Jenna KANERVA, Boris KATZ, TolgaKAYADELEN, Jessica KENNEY, Václava KETTNEROVÁ, Jesse KIRCHNER, ElenaKLEMENTIEVA, Arne KÖHN, Abdullatif KÖKSAL, Kamil KOPACEWICZ, TimoKORKIAKANGAS, Natalia KOTSYBA, Jolanta KOVALEVSKAITĖ, Simon KREK,Sookyoung KWAK, Veronika LAIPPALA, Lorenzo LAMBERTINO, Lucia LAM,Tatiana LANDO, Septina Dian LARASATI, Alexei LAVRENTIEV, John LEE,Phương LÊ Ĥ̀ONG, Alessandro LENCI, Saran LERTPRADIT, Herman LEUNG, Maria LEVINA, Cheuk Ying LI, Josie LI, Keying LI, Kyung Tae LIM, Yuan LI, Nikola LJUBEŠIĆ, Olga LOGINOVA, Olga LYASHEVSKAYA, Teresa LYNN, Vivien MACKETANZ, Aibek MAKAZHANOV, Michael MANDL, Christopher MANNING,Ruli MANURUNG, Cătălina MĂRĂNDUC, David MAREČEK, Katrin MARHEINECKE, Héctor MARTÍNEZ ALONSO, André MARTINS, Jan MAŠEK, Hiroshi MATSUDA, Yuji MATSUMOTO, Ryan MCDONALD, Sarah MCGUINNESS, Gustavo MENDONÇA, Niko MIEKKA, Margarita MISIRPASHAYEVA, Anna MISSILÄ, Cătălin MITITELU, Maria MITROFAN, Yusuke MIYAO, Simonetta MONTEMAGNI, Amir MORE, Laura MORENO ROMERO, Keiko Sophie MORI,Tomohiko MORIOKA, Shinsuke MORI, Shigeki MORO, Bjartur MORTENSEN,Bohdan MOSKALEVSKYI, Kadri MUISCHNEK, Robert MUNRO, Yugo MURAWAKI, Kaili MÜÜRISEP, Pinkey NAINWANI, Juan Ignacio NAVARRO HORÑIACEK, Anna NEDOLUZHKO, Gunta NEŠPORE-BĒRZKALNE, Lương NGUŶ̃EN THỊ, Huŷ̀en NGUŶ̃EN THỊ MINH, Yoshihiro NIKAIDO, VitalyNIKOLAEV, Rattima NITISAROJ, Hanna NURMI, Stina OJALA, Atul Kr. OJHA, Adédayọ̀ OLÚÒKUN, Mai OMURA, Emeka ONWUEGBUZIA, Petya OSENOVA, Robert ÖSTLING, Lilja ØVRELID, Şaziye Betül ÖZATEŞ, Arzucan ÖZGÜR, Balkız ÖZTÜRK BAŞARAN, Niko PARTANEN, Elena PASCUAL, Marco PASSAROTTI, Agnieszka PATEJUK, Guilherme PAULINO-PASSOS, Angelika PELJAK-ŁAPIŃSKA, Siyao PENG, Cenel-Augusto PEREZ, Guy PERRIER, Daria PETROVA, Slav PETROV, Jason PHELAN, Jussi PIITULAINEN, Tommi A PIRINEN, Emily PITLER, Barbara PLANK, Thierry POIBEAU, Larisa PONOMAREVA, Martin POPEL, Lauma PRETKALNIŅA, Sophie PRÉVOST, Prokopis PROKOPIDIS, Adam PRZEPIÓRKOWSKI, Tiina PUOLAKAINEN, Sampo PYYSALO, Peng QI, Andriela RÄÄBIS, Alexandre RADEMAKER, Loganathan RAMASAMY, Taraka RAMA, Carlos RAMISCH, Vinit RAVISHANKAR, Livy REAL,Petru REBEJA, Siva REDDY, Georg REHM, Ivan RIABOV, Michael RIEẞLER Erika RIMKUTĖ, Larissa RINALDI, Laura RITUMA, Luisa ROCHA, Mykhailo ROMANENKO, Rudolf ROSA, Valentin ROŞCA, Davide ROVATI, Olga RUDINA, Jack RUETER, Shoval SADDE, Benoît SAGOT, Shadi SALEH, Alessio SALOMONI,Tanja SAMARDŽIĆ, Stephanie SAMSON, Manuela SANGUINETTI, Dage SÄRG,Baiba SAULĪTE, Yanin SAWANAKUNANON, Salvatore SCARLATA, Nathan SCHNEIDER, Sebastian SCHUSTER, Djamé SEDDAH, Wolfgang SEEKER, Mojgan SERAJI, Mo SHEN, Atsuko SHIMADA, Hiroyuki SHIRASU, Muh SHOHIBUSSIRRI, Dmitry SICHINAVA, Aline SILVEIRA, Natalia SILVEIRA, Maria SIMI, Radu SIMIONESCU, Katalin SIMKÓ, Mária ŠIMKOVÁ, Kiril SIMOV, Maria SKACHEDUBOVA, Aaron SMITH, Isabela SOARES-BASTOS, Carolyn SPADINE,Antonio STELLA, Milan STRAKA, Jana STRNADOVÁ, Alane SUHR, Umut SULUBACAK, Shingo SUZUKI, Zsolt SZÁNTÓ, Dima TAJI, Yuta TAKAHASHI, Fabio TAMBURINI, Takaaki TANAKA, Samson TELLA, Isabelle TELLIER,Guillaume THOMAS, Liisi TORGA, Marsida TOSKA, Trond TROSTERUD, Anna TRUKHINA, Reut TSARFATY, Utku TÜRK, Francis TYERS, Sumire UEMATSU, Roman UNTILOV, Zdeňka UREŠOVÁ, Larraitz URIA, Hans USZKOREIT, Andrius UTKA, Sowmya VAJJALA, Daniel VAN NIEKERK, Gertjan VAN NOORD, Viktor VARGA, Eric VILLEMONTE DE LA CLERGERIE, Veronika VINCZE, Aya WAKASA, Lars WALLIN, Abigail WALSH, Jing Xian WANG, Jonathan North WASHINGTON, Maximilan WENDT, Paul WIDMER, Seyi WILLIAMS, Mats WIRÉN, Christian WITTERN, Tsegay WOLDEMARIAM, Tak-sum WONG, Alina WRÓBLEWSKA, Mary YAKO, Kayo YAMASHITA, Naoki YAMAZAKI, Chunxiao YAN, Koichi YASUOKA, Marat M. YAVRUMYAN, Zhuoran YU, Zdeněk ŽABOKRTSKÝ, Amir ZELDES, Hanzhi ZHU, and Anna ZHURAVLEVA (2020), Universal Dependencies 2.6,http://hdl.handle.net/11234/1-3226,LINDAT/CLARIAH-CZ digital library at the Institute of Formal and AppliedLinguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6bbc93c1-316e-4b81-a21f-2910fac208fc
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.