The logic and linguistic model for automatic extraction of collocation similarity

Khairova, N.; Petrasova, S.; Gautam, A. P. S.

Artykuł - szczegóły

Tytuł artykułu

The logic and linguistic model for automatic extraction of collocation similarity

Autorzy

Khairova N. , Petrasova S. , Gautam A. P. S.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

The article discusses the process of automatic identification of collocation similarity. The semantic analysis is one of the most advanced as well as the most difficult NLP task. The main problem of semantic processing is the determination of polysemy and synonymy of linguistic units. In addition, the task becomes complicated in case of word collocations. The paper suggests a logical and linguistic model for automatic determining semantic similarity between colocations in Ukraine and English languages. The proposed model formalizes semantic equivalence of collocations by means of semantic and grammatical characteristics of collocates. The basic idea of this approach is that morphological, syntactic and semantic characteristics of lexical units are to be taken into account for the identification of collocation similarity. Basic mathematical means of our model are logical-algebraic equations of the finite predicates algebra. Verb-noun and noun-adjective collocations in Ukrainian and English languages consist of words belonged to main parts of speech. These collocations are examined in the model. The model allows extracting semantically equivalent collocations from semi-structured and non-structured texts. Implementations of the model will allow to automatically recognize semantically equivalent collocations. Usage of the model allows increasing the effectiveness of natural language processing tasks such as information extraction, ontology generation, sentyment analysis and some others.

Słowa kluczowe

automatic extraction identification of collocation similarity finite predicates algebra logicalalgebraic equations grammatical and semantic features

Wydawca

Polish Academy of Sciences, Branch in Lublin

Czasopismo

ECONTECHMOD : An International Quarterly Journal on Economics of Technology and Modelling Processes

Rocznik

2015

Tom

Vol. 4, No 4

Strony

43--48

Opis fizyczny

Bibliogr. 20 poz., rys., wz.

Twórcy

autor

Khairova N.

nina_khajrova@yahoo.com

National Technical University "Kharkiv Polytechnic Institute

autor

Petrasova S.

National Technical University "Kharkiv Polytechnic Institute

autor

Gautam A. P. S.

National Technical University "Kharkiv Polytechnic Institute

Bibliografia

1. N. Khairova, G. Shepelyov, S. Petrasova. 2014. Evaluating effectiveness of linguistic technologies of knowledge identification in text collections. – Transactions on business and engineering intelligent. ITHEA. – Rzeszow – Sofia. 2014. – 71-75.
2. Thierry Poibeau, Horacio Saggion, Roman Yangarber (Eds.). 2008. Multilingual Information Extraction and Summarization Proceedings of MMIES-2: the Second Workshop on Multi-Lingual,-Source Information Extraction and Summarization, at COLING-2008: the 22nd International Conference on Computational Linguistics.
3. Bo Pang, Lillian Lee. 2008. Opinion mining and sentimentanalysis. Found. Trends Inf. Retr., 2 (1-2). 1–135.
4. Y. Burov. 2014. Business process modelling using ontological task models. Econtechmod. An international quarterly journal, Vol. 1, No. 1. 11–22.
5. V. Lytvyn. 2013. Design of intelligent decision support systems using ontological approach. Econtechmod. An international quarterly journal, Vol. 2, No. 1. 31–37.
6. V. Lytvyn, O. Semotuyk, O. Moroz. 2013. Definition of the semantic metrics on the basis of thesaurus of subject area. Econtechmod. An international quarterly journal, Vol. 2, No. 4. 47–51.
7. M. Ericsson, A. Wingkvist, and W. Löwe. 2012. Visualization of Text Clones in Technical Documentation. In Proceedings of the Swedish Chapter of Eurographics (SIGRAD). 79-82.
8. A. Wingkvist, M. Ericsson, and W. Löwe. 2011. Making Sense of Technical Information Quality: A Software-based Approach. Journal of Software Technology, 4(3). 12-18.
9. John Sinclair. 1991. Corpus, Concordance, Collocation. Oxford University Press, 179.
10. Christopher D. Manning, Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, 680.
11. V. Brezina, T. McEnery, S. Wattam. 2015. Collocations in context: a new perspective on collocation networks. International Journal of Corpus Linguistics, 20, 2. 139-173.
12. W. Church, P. Hanks. 1990. Word association norms, mutual information, and lexicography. – Computational Linguistics, 16(1). 22–29.
13. S. Evert, B. Krenn. 2001. Methods for the qualitative evaluation of lexical association measures – Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse. 188–195.
14. S. Koshcheeva, V. Zakharov. 2014. Comparing methods of automatic verb-noun collocation extraction. – Computational Models for Business and Engineering Domains. ITHEA. Rzeszow-Sofia. 158 – 171.
15. S. Evert., 2008. Corpora and collocations. In A. Ludeling and M. Kyto (eds.), Corpus Linguistics. An International Handbook, article 58. 1212-1248.
16. Muller P., Hathout N., Gaume B. 2006. Synonym Extraction Using a Semantic Distance on a Dictionary. Workshop on TextGraphs, at HLTNAACL. Association for Computational Linguistics. 65–72.
17. Hua WU, Ming ZHOU. 2003. Synonymous Collocation Extraction Using Translation Information. Proceeding ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, V. 1. 120-127.
18. D. Hindle, M. Rooth. 1993. Structural ambiguity and lexical relations. Association for Computational Linguistics, 19, 1. 103-120.
19. Bondarenko M., Shabanov-Kushnarenko J. 2007. The intelligence theory. Kharkiv: “SMIT”, 576. (In Russian).
20. N. Khairova, N. Sharonova. 2009. Use of Predicate Categories for Modelling of Operation of the Semantic Analyzer of the Linguistic Processor. Proceedings of IEEE EAST-West Design & Test Symposium (EWDTS'09). 204- 207.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-9db73312-c526-4b48-9399-497fb80b3196