ArNLI: Arabic Natural Language Inference entailment and contradiction detection

Al Jallad, Khloud; Ghneim, Nada

doi:10.7494/csci.2023.24.2.4378

Artykuł - szczegóły

Tytuł artykułu

ArNLI: Arabic Natural Language Inference entailment and contradiction detection

Autorzy

Al Jallad Khloud , Ghneim Nada

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2023.24.2.4378

Warianty tytułu

Języki publikacji

Abstrakty

Natural Language Inference (NLI) is a hot topic research in natural language processing, contradiction detection between sentences is a special case of NLI. This is considered a difficult NLP task which has a significant influence when added as a component in many NLP applications (such as question answering systems and text summarization). The Arabic language is one of the most challenging low-resources languages for detecting contradictions due to its rich lexical semantics ambiguity. We have created a data set of more than 12k sentences and named it ArNLI; it will be publicly available. Moreover, we have applied a new model that was inspired by Stanford's proposed contradiction-detection solutions for the English language. We proposed an approach for detecting contradictions between pairs of sentences in the Arabic language using a contradiction vector combined with a language model vector as an input to a machine-learning model. We analyzed the results of different traditional machine-learning classifiers and compared their results on our created data set (ArNLI) and on the automatic translation of both the PHEME and SICK English data sets. The best results were achieved by using the random forest classifier, with accuracies of 0.99, 0.60 and 0.75 on PHEME, SICK, and ArNLI respectively.

Słowa kluczowe

textual entailment Arabic NLP contradiction detection contradiction Arabic data set textual inference

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2023

Tom

T. 24 (2)

Strony

183--204

Opis fizyczny

Bibliogr. 52 poz., rys., tab., wykr.

Twórcy

autor

Al Jallad Khloud

k-aljallad@aiu.edu.sy

Arab International University, Faculty of Information Technology Engineering, Daraa, Syria

autor

Ghneim Nada

n-ghneim@aiu.edu.sy

Arab International University, Faculty of Information Technology Engineering, Daraa, Syria

Bibliografia

[1] AdaBoostClassifier scikit-learn.org, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble. AdaBoostClassifier.html. Accessed 11 11 2020.
[2] AL-Khawaldeh F.T.: A Study of the Effect of Resolving Negation and Sentiment Analysis in Recognizing Text Entailment for Arabic, World of Computer Science and Information Technology Journal (WCSIT), vol. 5(7), pp. 124–128, 2015.doi: 10.48550/ARXIV.1907.03871.
[3] Alabbas M.: A Dataset for Arabic Textual Entailment. In: Proceedings of the Student Research Workshop associated with RANLP 2013, pp. 7–13, INCOMA Ltd. Shoumen, BULGARIA, Hissar, Bulgaria, 2013. https://aclanthology.org/R13-2002.
[4] Alabbas M., Ramsay A.: Natural Language Inference for Arabic Using Extended Tree Edit Distance with Subtrees, Journal of Artificial Intelligence Research, vol. 48, pp. 1–22, 2013. doi: 10.1613/jair.3892.
[5] Almarwani N., Diab M.: Arabic Textual Entailment with Word Embeddings.In: Proceedings of the Third Arabic Natural Language Processing Workshop, pp. 185–190, Association for Computational Linguistics, Valencia, Spain, 2017.doi: 10.18653/v1/W17-1322.
[6] Amirkhani H., AzariJafari M., Pourjafari Z., Faridan-Jahromi S., Kouhkan Z., Amirak A.: FarsTail: A Persian Natural Language Inference Dataset, 2020.doi: 10.48550/ARXIV.2009.08820.
[7] ANli allenai, 2019, https://leaderboard.allenai.org/anli/submissions/get-started.Accessed 2021.
[8] Ben-Sghaier M., Bakari W., Neji M.: Classification and Analysis of Arabic Natural Language Inference Systems, Procedia Computer Science, vol. 176,pp. 551–560, 2020. doi: 10.1016/j.procs.2020.08.057.
[9] Bos J., Zanzotto F.M., Pennacchiotti M.: Textual Entailment at EVALITA 2009. In: Poster and Workshop Proceedings of the 11th Conference of the Italian Association for Artificial Intelligence, vol. 9, 2009.
[10] Boudaa T., Marouani M.E., Enneya N.: Alignment Based Approach for ArabicTextual Entailment,Procedia Computer Science, vol. 148, pp. 246–255, 2019.
[11] Budur E., Özçelik R., Güngör T., Potts C.: Data and Representation for TurkishNatural Language Inference. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8253–8267, 2020.doi: 10.18653/v1/2020.emnlp-main.662.
[12] Clark P.: Recognizing Textual Entailment, QA4MRE, and Machine Reading. In: P. Forner, J. Karlgren, C. Womser-Hacker (eds.),CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, September 17–20, 2012, CEUR Workshop Proceedings, vol. 1178, CEUR-WS.org, 2012. http://ceur-ws.org/Vol-1178/CLEF2012wn-QA4MRE-Clark2012.pdf.
[13] DecisionTreeClassifier scikit-learn.org, https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html. Accessed 11.11.2020.
[14] Delmonte R., Bristot A., Boniforti M.A.P., Tonelli S.: Entailment and Anaphora Resolution in RTE3. In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, p. 48–53, RTE ’07, Association for Computational Linguistics, USA, 2007.
[15] Dolan W.B., Brockett C.: Automatically Constructing a Corpus of Sentential Paraphrases. In: Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005. https://aclanthology.org/I05-5002.
[16] Eichler K., Gabryszak A., Neumann G.: An analysis of textual inference in German customer emails. In: Proceedings of the Third Joint Conference on Lexicaland Computational Semantics (*SEM 2014), pp. 69–74, Dublin, Ireland, August23–24, 2014.
[17] Fonseca E., Borges dos Santos L., Criscuolo M., Aluisio S.: Overview of the evaluation of semantic similarity and textual inference, LinguaMÁTICA, vol. 8(2),pp. 3–13, 2016.
[18] Harabagiu S., Hickl A., Lacatusu F.: Negation, Contrast and Contradiction in Text Processing. In: Proceedings of the 21st National Conference on Artificial Intelligence – Volume 1, pp. 755–762, AAAI’06, AAAI Press, 2006.
[19] Hayashibe Y.: Japanese Realistic Textual Entailment Corpus. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6827–6834,European Language Resources Association, Marseille, France, 2020. https://aclanthology.org/2020.lrec-1.843.
[20] He P., Liu X., Gao J., Chen W.: DeBERTa: Decoding-enhanced BERT with Disentangled Attention., CoRR, 2020. https://arxiv.org/abs/2006.03654.
[21] Hu H., Richardson K., Xu L., Li L., Kuebler S., Moss L.S.: OCNLI: OriginalChinese Natural Language Inference, 2020. doi: 10.48550/ARXIV.2010.05444.
[22] Jurafsky D., Martin J.H.:Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall PTR, USA, 1st ed., 2000.
[23] Khader M., Awajan A.A., Al-Kouz A.: Textual Entailment for Arabic Language based on Lexical and Semantic Matching, International Journal of Computing,vol. 12, pp. 67–74, 2016.
[24] KNN Classifier scikit-learn.org, https://scikit-learn.org/stable/modules/generated/sklearn.neighbors. KNeighborsClassifier.html. Accessed 11.11.2020.
[25] Koreeda Y., Manning C.: ContractNLI: A Dataset for Document-level NaturalLanguage Inference for Contracts. In:Findings of the Association for Compu-tational Linguistics: EMNLP 2021, pp. 1907–1919, Association for Computa-tional Linguistics, Punta Cana, Dominican Republic, 2021. doi: 10.18653/v1/2021.findings-emnlp.164.
[26] Lendvai P., Augenstein I., Bontcheva K., Declerck T.: Monolingual SocialMedia Datasets for Detecting Contradiction and Entailment. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation(LREC’16), pp. 4602–4605, European Language Resources Association (ELRA), Portorož, Slovenia, 2016. https://aclanthology.org/L16-1729.
[27] Li L., Qin B., Liu T.: Contradiction Detection with Contradiction-Specific WordEmbedding, Algorithms, vol. 10(2), 59, 2017. doi: 10.3390/a10020059.
[28] Lingam V., Bhuria S., Nair M., Gurpreetsingh D., Goyal A., Sureka A.: Deep learning for conflicting statements detection in text, 2018. doi: 10.7287/peerj.preprints.26589v1.
[29] Lippi M., Torroni P.: Argumentation Mining: State of the Art and Emerging Trends, ACM Trans Internet Technol, vol. 16(2), 2016. doi: 10.1145/2850417.
[30] Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., Levy O.,et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach, 2019. arXiv:1907.11692.
[31] Liu Z., Lin Y., Sun M.: Representation Learning and NLP. In: Representation Learning for Natural Language Processing, pp. 1–11, Springer, Singapore, 2020.doi: 10.1007/978-981-15-5573-2_1.
[32] MacCartney B., Grenager T., de Marneffe M.C., Cer D., Manning C.D.: Learningto recognize features of valid textual entailments. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 41–48, Association for Computational Linguistics, New York City, USA, 2006. https://aclanthology.org/N06-1006.
[33] Marelli M., Bentivogli L., Baroni M., Bernardi R., Menini S., Zamparelli R.: SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entail-ment. In: Proceedings of the 8th International Workshop on Semantic Evaluation(SemEval 2014), pp. 1–8, Association for Computational Linguistics, Dublin, Ireland, 2014. doi: 10.3115/v1/S14-2001.
[34] Marneffe de M.C., Rafferty A.N., Manning C.D.: Finding Contradictions in Text.In: Proceedings of ACL-08: HLT, pp. 1039–1047, Association for Computational Linguistics, Columbus, Ohio, 2008. https://aclanthology.org/P08-1118.
[35] Microsoft Research Paraphrase Corpus Microsoft Research, 2005, https://www.microsoft.com/en-us/download/details.aspx?id=52398.. Accessed 2021.
[36] Mishra A., Patel D., Vijayakumar A., Li X., Kapanipathi P., Talamadupula K.:Reading Comprehension as Natural Language Inference: A Semantic Analysis.In:Proceedings of the Ninth Joint Conference on Lexical and Computational Se-mantics, pp. 12–19, Association for Computational Linguistics, Barcelona, Spain(Online), 2020. https://aclanthology.org/2020.starsem-1.2.
[37] Padó S., Galley M., Jurafsky D., Manning C.D.: Robust Machine Translation Evaluation with Entailment Features. In: K. Su, J. Su, J. Wiebe (eds.), ACL2009, Proceedings of the Joint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference on Natural Language Processingof the AFNLP, pp. 297–305, The Association for Computer Linguistics, 2009.https://aclanthology.org/P09-1034/.
[38] RandomForestClassifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. Accessed 11.11.2020.
[39] Ritter A., Downey D., Soderland S., Etzioni O.: It’s a Contradiction – no, it’s not: A Case Study using Functional Relations. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 11–20, Association for Computational Linguistics, Honolulu, Hawaii, 2008. https://aclanthology.org/D08-1002.
[40] Rocha G., Lopes Cardoso H.: Recognizing Textual Entailment: Challenges in the Portuguese Language,Information, vol. 9(4), 2018. doi: 10.3390/info9040076.
[41] Romanov A., Shivade C.: Lessons from Natural Language Inference in the Clinical Domain. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 1586–1596, Association for Computational Linguistics, Brussels, Belgium, 2018. doi: 10.18653/v1/D18-1187.
[42] Roth D., Sammons M., Vydiswaran V.: A Framework for Entailed RelationRecognition. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 57–60, Association for Computational Linguistics, Suntec, Singapore, 2009. https://aclanthology.org/P09-2015.
[43] Sammons M., Vydiswaran V., Roth D.: Recognizing Textual Entailment. In: D.M. Bikel, I. Zitouni (eds.), Multilingual Natural Language Applications: FromTheory to Practice, pp. 209–258, IBM Press, Pearson plc, 2012.
[44] SGD Classifier scikit-learn.org. https://scikit-learn.org/stable/modules/sgd.html. Accessed 11.11.2020.
[45] Şulea O.M.: Recognizing Textual Entailment in Twitter Using Word Embeddings.In: Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP, pp. 31–35, Association for Computational Linguistics, Copenhagen, Denmark, 2017. doi: 10.18653/v1/W17-5306.
[46] superGLUE NYU; FaceBook; DeepMind; UWNLP, 2019. https://super.gluebenchmark.com/. Accessed 2021.
[47] SVM Classifier scikit-learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html. Accessed 11.11.2020.
[48] Wang A., Pruksachatkun Y., Nangia N., Singh A., Michael J., Hill F., Levy O.,Bowman S.: SuperGLUE: A Stickier Benchmark for General-Purpose Laguage Understanding Systems. In: H. Wallach, H. Larochelle, A. Beygelzimer,F. d'Alché-Buc, E. Fox, R. Garnett (eds.), Advances in Neural Information Processing Systems, vol. 32, 2019. https://proceedings.neurips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf.
[49] Wang A., Singh A., Michael J., Hill F., Levy O., Bowman S.: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding.In: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and In-terpreting Neural Networks for NLP, pp. 353–355, Association for Computational Linguistics, Brussels, Belgium, 2018. doi: 10.18653/v1/W18-5446.
[50] Wang S., Fang H., Khabsa M., Mao H., Ma H.: Entailment as Few-Shot Learner, 2021. doi: 10.48550/ARXIV.2104.14690.
[51] Williams A., Nangia N., Bowman S.: A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference, 2017. arXiv:1704.05426.
[52] Yates A., Banko M., Broadhead M., Cafarella M., Etzioni O., Soderland S.: TextRunner: Open Information Extraction on the Web. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), pp. 25–26, Association for Computational Linguistics, Rochester, New York,USA, 2007. https://aclanthology.org/N07-4013

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-2b454c27-4e81-4d93-a27c-ddd11633550b