Text : now in 2D! A framework for lexical expansion with contextual similarity

Biemann, C.; Riedl, M.

doi:10.15398/jlm.v1i1.60

Artykuł - szczegóły

Tytuł artykułu

Text : now in 2D! A framework for lexical expansion with contextual similarity

Autorzy

Biemann C. , Riedl M.

Treść / Zawartość

Pełne teksty:

Biemann_Text now in 2D! A framework_1_2013.pdf

Pobierz

Identyfikatory

DOI

10.15398/jlm.v1i1.60

Warianty tytułu

Języki publikacji

Abstrakty

A new metaphor of two-dimensional text for data-driven semantic modeling of natural language is proposed, which provides an entirely new angle on the representation of text: not only syntagmatic relations are annotated in the text, but also paradigmatic relations are made explicit by generating lexical expansions. We operationalize distributional similarity in a general framework for large corpora, and describe a new method to generate similar terms in context. Our evaluation shows that distributional similarity is able to produce high-quality lexical resources in an unsupervised and knowledge-free way, and that our highly scalable similarity measure yields better stores in a WordNet-based evaluation than previous measures for very large corpora. Evaluating on a lexical substitution task, we find that our contextualization method improves over a non-contextualized baseline across all parts of speech, and we show how the metaphor can be applied successfully to part-of-speech tagging. A number of ways to extend and improve the contextualization method within our Framework are discussed. As opposed to comparable approaches, our framework defines a model of lexical expansions in context that can generate the expansions as opposed to ranking a given list, and thus does not require existing lexical-semantic resources.

Słowa kluczowe

distributional semantics lexical expansion contextual similarity lexical substitution computational semantics

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2013

Tom

Vol. 1, No. 1

Strony

55--95

Opis fizyczny

Bibliogr. 70 poz., rys., tab., wykr.

Twórcy

autor

Biemann C.

biem@cs.tu-darmstadt.de

Computer Science Department, FG Language Technology, TU Darmstadt, Germany

autor

Riedl M.

riedl@cs.tu-darmstadt.de

Computer Science Department, FG Language Technology, TU Darmstadt, Germany

Bibliografia

[1] Michele Banko and Eric Brill (2001), Scaling to very very large corpora for natural language disambiguation, in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL ’01, pp. 26-33, Association for Computational Linguistics, Stroudsburg, PA, USA, http://dx.doi.org/10.3115/1073012.1073017.
[2] Marco Baroni and Alessandro Lenci (2010), Distributional memory: A general framework for corpus-based semantics, Computational Linguistics, 36 (4): 673-721, ISSN 0891-2017, http://dx.doi.org/10.1162/coli_a_00016.
[3] Marco Baroni and Roberto Zamparelli (2010), Nouns are vectors, adjectives are matrices: representing adjective-noun constructions in semantic space, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, pp. 1183-1193, Cambridge, Massachusetts, http://dl.acm.org/citation.cfm?id=1870658.1870773.
[4] Chris Biemann (2009), Unsupervised Part-of-Speech Tagging in the Large, Research on Language and Computation, 7 (2-4): 101-135, ISSN 1570-7075, http://dx.doi.org/10.1007/s11168-010-9067-9.
[5] Chris Biemann (2010), Co-occurrence cluster features for lexical substitutions in context, in Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-5, pp. 55-59, ISBN 978-1-932432-77-0, http://dl.acm.org/citation.cfm?id=1870490.1870499.
[6] Chris Biemann and Eugenie Giesbrecht (2011), Distributional Semantics and Compositionality 2011: Shared Task Description and Results, in Proceedings of the Workshop on Distributional Semantics and Compositionality, pp. 21-28, Association for Computational Linguistics, Portland, Oregon, USA, http://www.aclweb.org/anthology/W11-1304.
[7] Chris Biemann, Uwe Quasthoff, Gerhard Heyer, and Florian Holz (2008), ASV Toolbox: a Modular Collection of Language Exploration Tools, in Proceedings of the International Conference on Language Resources and Evaluation, LREC 2008, Marrakech, Morocco, http://www.lrec-conf.org/proceedings/lrec2008/summaries/447.html.
[8] Chris Biemann, Stefanie Roos, and Karsten Weihe (2012), Quantifying Semantics Using Complex Network Analysis, in Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India, http://aclweb.org/anthology/C/C12/C12-1017.pdf.
[9] David M. Blei, Andrew Y. Ng, and Michael I. Jordan (2003), Latent Dirichlet allocation, Journal of Machine Learning Research, 3: 993-1022, ISSN 1532-4435, http://dl.acm.org/citation.cfm?id=944919.944937.
[10] Stefan Bordag (2008), A comparison of co-occurrence and similarity measures as simulations of context, in CICLing’08 Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 52-63, Haifa, Israel, http://dl.acm.org/citation.cfm?id=1787578.1787584.
[11] Jordan Boyd-Graber and David M. Blei (2008), Syntactic Topic Models, in Neural Information Processing Systems, Vancouver, British Columbia, http://www.cs.princeton.edu/~blei/papers/Boyd-GraberBlei2009.pdf.
[12] Kenneth Ward Church and Patrick Hanks (1990), Word association norms, mutual information, and lexicography, Computational Linguistics, 16 (1): 22-29, ISSN 0891-2017, http://dl.acm.org/citation.cfm?id=89086.89095.
[13] Michael Collins (2002), Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms, in Proceedings of the ACL-02 conference on Empirical methods in natural language processing – Volume 10, EMNLP ’02, pp. 1-8, Association for Computational Linguistics, Stroudsburg, PA, USA, http://dx.doi.org/10.3115/1118693.1118694.
[14] James R. Curran (2002), Ensemble methods for automatic thesaurus extraction, in Proceedings of the ACL-02 conference on Empirical methods in natural language processing – Volume 10, EMNLP ’02, pp. 222-229, http://dx.doi.org/10.3115/1118693.1118722.
[15] James R. Curran (2004), From Distributional to Semantic Similarity, University of Edinburgh, http://books.google.de/books?id=2iDbSAAACAAJ.
[16] Ferdinand de Saussure (1916), Cours de linguistique générale, Payot, Paris, http://www.bibsonomy.org/bibtex/2e68b895a274b9569189c5ae98db84603/jntr.
[17] Ferdinand de Saussure (1959), Course in general linguistics, Language (Philosophical Library), Philosophical Library, http://books.google.de/books?id=FSpZAAAAMAAJ.
[18] Jeffrey Dean and Sanjay Ghemawat (2004), MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of Operating Systems, Desing & Implementation (OSDI) ’04, pp. 137-150, San Francisco, CA, USA, http://doi.acm.org/10.1145/1327452.1327492.
[19] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman (1990), Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41 (6): 391-407, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.8490.
[20] Inderjit S. Dhillon (2001), Co-clustering documents and words using bipartite spectral graph partitioning, in Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’01, pp. 269-274, ACM, New York, NY, USA, ISBN 1-58113-391-X, http://doi.acm.org/10.1145/502512.502550.
[21] Ted Dunning (1993), Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19 (1): 61-74, ISSN 0891-2017, http://dl.acm.org/citation.cfm?id=972450.972454.
[22] Katrin Erk and Sebastian Padó (2008), A structured vector space model for word meaning in context, in Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pp. 897-906, Honolulu, Hawaii, http://dl.acm.org/citation.cfm?id=1613715.1613831.
[23] Stefan Evert (2005), The Statistics of Word Cooccurrences: Word Pairs and Collocations., Ph.D. thesis, Institut für Maschinelle Sprachverarbeitung, University of Stuttgart, http://elib.uni-stuttgart.de/opus/volltexte/2005/2371/.
[24] Eugenie Giesbrecht (2009), In Search of Semantic Compositionality in Vector Spaces, in Proceedings of the 17th International Conference on Conceptual Structures: Conceptual Structures: Leveraging Semantic Technologies, ICCS ’09, pp. 173-184, Springer-Verlag, Berlin, Heidelberg, ISBN 978-3-642-03078-9, http://dx.doi.org/10.1007/978-3-642-03079-6_14.
[25] Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith (2011), Part-of-speech tagging for Twitter: annotation, features, and experiments, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers – Volume 2, HLT ’11, pp. 42-47, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-932432-88-6, http://dl.acm.org/citation.cfm?id=2002736.2002747.
[26] Gene H. Golub and William M. Kahan (1965), Calculating the singular values and pseudo-inverse of a matrix, Journal of the Society for Industrial and Applied Mathematics: Series B: Numerical Analysis, 2: 205-224, http://www.citeulike.org/user/rabio/article/2342309.
[27] James Gorman and James R. Curran (2006), Scaling Distributional Similarity to Large Corpora, in Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 361-368, Association for Computational Linguistics, Sydney, Australia, http://www.aclweb.org/anthology/P06-1046.
[28] Amit Goyal, Hal Daumé III, and Graham Cormode (2012), Sketch Algorithms for Estimating Point Queries in NLP, in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1093-1103, Association for Computational Linguistics, http://www.aclweb.org/anthology/D12-1100.
[29] Emiliano Guevara (2011), Computing semantic compositionality in distributional semantics, in Proceedings of the Ninth International Conference on Computational Semantics, IWCS ’11, pp. 135-144, Association for Computational Linguistics, Stroudsburg, PA, USA, http://dl.acm.org/citation.cfm?id=2002669.2002684.
[30] Zellig S. Harris (1951), Methods in Structural Linguistics, University of Chicago Press, Chicago, http://archive.org/details/structurallingui00harr.
[31] W. Keith Hastings (1970), Monte Carlo sampling methods using Markov chains and their applications, Biometrika, 57 (1): 97-109, ISSN 1464-3510, doi:10.1093/biomet/57.1.97, http://dx.doi.org/10.1093/biomet/57.1.97.
[32] Marti A. Hearst (1992), Automatic acquisition of hyponyms from large text corpora, in Proceedings of the 14th conference on Computational linguistics – Volume 2, COLING ’92, pp. 539-545, http://dx.doi.org/10.3115/992133.992154.
[33] Enrique Henestroza Anguiano and Pascal Denis (2011), FreDist: Automatic construction of distributional thesauri for French, in TALN – 18ème conférence sur le traitement automatique des langues naturelles, pp. 119-124, Montpellier, France, France, http://hal.inria.fr/hal-00602004.
[34] Thomas Hofmann (1999), Probabilistic latent semantic indexing, in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, pp. 50-57, ACM, New York, NY, USA, ISBN 1-58113-096-1, http://doi.acm.org/10.1145/312624.312649.
[35] Adam Kilgarriff, Pavel Rychly, Pavel Smrz, and David Tugwell (2004), The Sketch Engine, in Proceedings of EURALEX, http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.180.7984.
[36] Walter Kintsch (2001), Predication, Cognitive Science, 25 (2): 173-202, ISSN 1551-6709, http://dx.doi.org/10.1207/s15516709cog2502_1.
[37] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira (2001), Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, in Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pp. 282-289, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, ISBN 1-55860-778-1, http://dl.acm.org/citation.cfm?id=645530.655813.
[38] Lillian Lee (1999), Measures of distributional similarity, in Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, ACL ’99, pp. 25-32, College Park, Maryland, ISBN 1-55860-609-3, http://dx.doi.org/10.3115/1034678.1034693.
[39] Michael Lesk (1986), Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, in Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC ’86, pp. 24-26, ACM, New York, NY, USA, ISBN 0-89791-224-1, http://doi.acm.org/10.1145/318723.318728.
[40] Dekang Lin (1998), Automatic retrieval and clustering of similar words, in Proceedings of the 17th International Conference on Computational Linguistics – Volume 2, COLING ’98, pp. 768-774, http://dx.doi.org/10.3115/980432.980696.
[41] Dekang Lin, Shaojun Zhao, Lijuan Qin, and Ming Zhou (2003), Identifying synonyms among distributionally similar words, in Proceedings of the 18th International Joint Conference on Artificial Intelligence, IJCAI’03, pp. 1492-1493, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, http://dl.acm.org/citation.cfm?id=1630659.1630908.
[42] Jimmy Lin and Chris Dyer (2010), Data-Intensive Text Processing with MapReduce, Morgan & Claypool Publishers, San Rafael, CA, http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.169.6896.
[43] Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini (1993), Building a large annotated corpus of English: the Penn Treebank, Computational Linguistics, 19 (2): 313-330, ISSN 0891-2017, http://dl.acm.org/citation.cfm?id=972470.972475.
[44] Marie-Catherine De Marneffe, Bill Maccartney, and Christopher D. Manning (2006), Generating typed dependency parses from phrase structure parses, in Proceedings of the International Conference on Language Resources and Evaluation, LREC 2006, Genova, Italy, http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.229.775.
[45] Diana McCarthy and Roberto Navigli (2009), The English lexical substitution task., Language Resources and Evaluation, 43 (2): 139-159, http://dblp.uni-trier.de/db/journals/lre/lre43.html#McCarthyN09.
[46] George A. Miller and Walter G. Charles (1991), Contextual correlates of semantic similarity, Language and Cognitive Processes, 6 (1): 1-28, http://dx.doi.org/10.1080/01690969108406936.
[47] Tristan Miller, Chris Biemann, Torsten Zesch, and Iryna Gurevych (2012), Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation, in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), pp. 1781-1796, Mumbai, India, http://aclweb.org/anthology/C/C12/C12-1109.pdf.
[48] Jeff Mitchell and Mirella Lapata (2008), Vector-based Models of Semantic Composition, in Proceedings of ACL-08: HLT, pp. 236-244, Columbus, Ohio, www.aclweb.org/anthology/P08-1028.pdf.
[49] Sebastian Padó and Mirella Lapata (2007), Dependency-based construction of semantic space models, Computational Linguistics, 33 (2): 161-199, http://citeseer.uark.edu:8080/citeseerx/viewdoc/summary?doi=10.1.1.86.2026.
[50] Sebastian Padó and Yves Peirsman, editors (2011), Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, Association for Computational Linguistics, Edinburgh, UK, http://www.aclweb.org/anthology/W11-25.
[51] Robert Parker, David Graff, Junbo Kong, Ke Chen, and Kazuaki Maeda (2011), English Gigaword Fifth Edition, Linguistic Data Consortium, Philadelphia, http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T07.
[52] Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi (2004), WordNet::Similarity: measuring the relatedness of concepts, in Demonstration Papers at HLT-NAACL 2004, HLT-NAACL – Demonstrations ’04, pp. 38-41, http://dl.acm.org/citation.cfm?id=1614025.1614037.
[53] Fernando Pereira, Naftali Tishby, and Lillian Lee (1993), Distributional clustering of English words, in Proceedings of the 31st annual meeting on Association for Computational Linguistics, ACL ’93, pp. 183-190, Association for Computational Linguistics, Stroudsburg, PA, USA, http://dx.doi.org/10.3115/981574.981598.
[54] Reinhard Rapp (2003), Word sense discovery based on sense descriptor dissimilarity, in Proceedings of the Ninth Machine Translation Summit, pp. 315-322, http://www.citeulike.org/user/briordan/article/2911465.
[55] Matthias Richter, Uwe Quasthoff, Erla Hallsteinsdóttir, and Chris Biemann (2006), Exploiting the Leipzig Corpora Collection, in Proceesings of the IS-LTC 2006, Ljubljana, Slovenia, http://nl.ijs.si/is-ltc06/proc/13_Richter.pdf.
[56] Herbert Rubenstein and John B. Goodenough (1965), Contextual correlates of synonymy, Communications of the ACM, 8 (10): 627-633, ISSN 0001-0782, http://doi.acm.org/10.1145/365628.365657.
[57] Gerda Ruge (1992), Experiments on linguistically-based term associations, Information Processing & Management, 28 (3): 317-332, ISSN 0306-4573, http://www.sciencedirect.com/science/article/pii/030645739290078E.
[58] Magnus Sahlgren (2006), The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces., Ph.D. thesis, Stockholm University, http://soda.swedish-ict.se/437/.
[59] Helmut Schmid (1995), Improvements in Part-of-Speech Tagging with an Application to German, in Proceedings of the ACL SIGDAT-Workshop, pp. 47-50, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.2255.
[60] Hinrich Schütze (1993), Word Space, in Advances in Neural Information Processing Systems 5, pp. 895-902, Morgan Kaufmann, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.8856.
[61] Hinrich Schütze (1998), Automatic word sense discrimination, Computational Linguistics, 24 (1): 97-123, ISSN 0891-2017, http://dl.acm.org/citation.cfm?id=972719.972724.
[62] Anders Søgaard (2011), Semisupervised condensed nearest neighbor for part-of-speech tagging, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers – Volume 2, HLT’11, pp. 48-52, Portland, Oregon, ISBN 978-1-932432-88-6, http://dl.acm.org/citation.cfm?id=2002736.2002748.
[63] György Szarvas, Chris Biemann, and Iryna Gurevych (2013), Supervised All-Words Lexical Substitution using Delexicalized Features, in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-2013), Atlanta, GA, USA, http://aclweb.org/anthology/N/N13/N13-1133.pdf.
[64] Ming Tan, Wenli Zhou, Lei Zheng, and Shaojun Wang (2012), A scalable distributed syntactic, semantic, and lexical language model, Computational Linguistics, 38 (3): 631-671, ISSN 0891-2017, http://dx.doi.org/10.1162/COLI_a_00107.
[65] Stefan Thater, Hagen Fürstenau, and Manfred Pinkal (2011), Word Meaning in Context: A Simple and Effective Vector Model, in Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 1134-1143, Asian Federation of Natural Language Processing, Chiang Mai, Thailand, http://www.aclweb.org/anthology/I11-1127.
[66] Peter D. Turney and Michael L. Littman (2005), Corpus-based Learning of Analogies and Semantic Relations, Machine Learning, 60 (1-3): 251-278, ISSN 0885-6125, http://dx.doi.org/10.1007/s10994-005-0913-1.
[67] Peter D. Turney and Patrick Pantel (2010), From frequency to meaning: vector space models of semantics, Journal of Artificial Intelligence Research, 37 (1): 141-188, ISSN 1076-9757, http://dl.acm.org/citation.cfm?id=1861751.1861756.
[68] Andrew J. Viterbi (1967), Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory, 13 (2): 260-269, ISSN 0018-9448, doi:10.1109/TIT.1967.1054010, http://dx.doi.org/10.1109/TIT.1967.1054010.
[69] Julie Weeds (2003), Measures and Applications of Lexical Distributional Similarity, Ph.D. thesis, Department of Informatics, University of Sussex, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.538.
[70] Dominic Widdows and Beate Dorow (2002), A graph model for unsupervised lexical acquisition, in Proceedings of the 19th International Conference on Computational Linguistics – Volume 1, COLING’02, pp. 1-7, http://dx.doi.org/10.3115/1072228.1072342.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-d5d012c3-466d-4bff-909a-55f9bef0cbf0