Named-entity recognition for Hindi language using context pattern-based maximum entropy

Jain, Arti; Yadav, Divakar; Arora, Anuja; Tayal, Devendra K.

doi:10.7494/csci.2022.23.1.3977

Artykuł - szczegóły

Tytuł artykułu

Named-entity recognition for Hindi language using context pattern-based maximum entropy

Autorzy

Jain Arti , Yadav Divakar , Arora Anuja , Tayal Devendra K.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2022.23.1.3977

Warianty tytułu

Języki publikacji

Abstrakty

This paper describes a named-entity-recognition (NER) system for the Hindi language that uses two methodologies: an existing baseline maximum entropy-based named-entity (BL-MENE) model, and the proposed context pattern-based MENE (CP-MENE) framework. BL-MENE utilizes several baseline features for the NER task but suffers from inaccurate named-entity (NE) boundary detection, misclassification errors, and the partial recognition of NEs due to certain missing essentials. However, the CP-MENE-based NER task incorporates extensive features and patterns that are set to overcome these problems. In fact, CP-MENE’s features include right-boundary, left-boundary, part-of-speech, synonym, gazetteer and relative pronoun features. CP-MENE formulates a kind of recursive relationship for extracting highly ranked NE patterns that are generated through regular expressions via Python@ code. Since the web content of the Hindi language is arising nowadays (especially in health care applications), this work is conducted on the Hindi health data (HHD) corpus (which is readily available from the Kaggle dataset). Our experiments were conducted on four NE categories; namely, Person (PER), Disease (DIS), Consumable (CNS), and Symptom (SMP).

Słowa kluczowe

context patterns gazetteer lists Hindi language Kaggle dataset maximum entropy named-entity recognition feature extension

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2022

Tom

T. 23 (1)

Strony

81--115

Opis fizyczny

Bibliogr. 141 poz., rys., tab.

Twórcy

autor

Jain Arti

ajain.jiit@gmail.com

Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India

autor

Yadav Divakar

divakar.yadav0@gmail.com

Divakar Yadav NIT Hamirpur, Himachal Pradesh, India

autor

Arora Anuja

anuja.arora29@gmail.com

Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India

autor

Tayal Devendra K.

dev_tayal2001@yahoo.com

Indira Gandhi Delhi Technical University for Women, New Delhi, India

Bibliografia

[1] Abinaya N., John N., Ganesh B.H., Kumar A.M., Soman K.: AMRITA_CEN@FIRE-2014: Named Entity Recognition for Indian Languages using Rich Features. In: Proceedings of the Forum for Information Retrieval Evaluation, pp. 103–111, 2014. doi: 10.1145/2824864.2824882.
[2] Al-Rfou R., Kulkarni V., Perozzi B., Skiena S.: POLYGLOT-NER: Massive Multilingual Named Entity Recognition. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 586–594, SIAM, 2015.
[3] Alsaaran N., Alrabiah M.: Classical Arabic Named Entity Recognition Using Variant Deep Neural Network Architectures and BERT, IEEE Access, vol. 9, pp. 91537–91547, 2021.
[4] Asti L., Uguzzoni G., Marcatili P., Pagnani A.: Maximum-entropy models of sequenced immune repertoires predict antigen-antibody affinity, PLoS Computational Biology, vol. 12(4), p. e1004870, 2016.
[5] Athavale V., Bharadwaj S., Pamecha M., Prabhu A., Shrivastava M.: Towards Deep Learning in Hindi NER: An approach to tackle the Labelled Data Scarcity, arXiv preprint arXiv:161009756, 2016.
[6] Banawan K., Ulukus S.: The Capacity of Private Information Retrieval from Coded Databases, IEEE Transactions on Information Theory, vol. 64(3), pp. 1945–1956, 2018.
[7] Benajiba Y., Rosso P., Benedíruiz J.M.: ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 143–153, Springer, 2007.
[8] Bender O., Och F.J., Ney H.: Maximum entropy models for named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pp. 148–151, 2003.
[9] Biswas S., Mishra M., Acharya S., Mohanty S.: A two stage language independent named entity recognition for Indian languages, (IJCSIT) International Journal of Computer Science and Information Technologies, vol. 1(4), pp. 285–289, 2010.
[10] Bontcheva K., Derczynski L., Roberts I.: Crowdsourcing named entity recognition and entity linking corpora. In: Handbook of Linguistic Annotation, pp. 875–892, Springer, 2017.
[11] Borthwick A., Sterling J., Agichtein E., Grishman R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: Sixth Workshop on Very Large Corpora, 1998.
[12] Carpuat M., Wu D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 61–72, 2007.
[13] Carreras X., Màrquez L., Padró L.: Learning a perceptron-based named entity chunker via online recognition feedback. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pp. 156–159, 2003.
[14] Charniak E.: A maximum-entropy-inspired parser. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics, 2000.
[15] Chatterjee N., Kaushik N.: RENT: Regular expression and NLP-based term extraction scheme for agricultural domain. In: Proceedings of the International Conference on Data Engineering and Communication Technology, pp. 511–522, Springer, 2017.
[16] Chen C., Kong F.: Enhancing Entity Boundary Detection for Better Chinese Named Entity Recognition. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Vol. 2: Short Papers), pp. 20–25, 2021.
[17] Chinchor N., Marsh E.: Muc-7 information extraction task definition. In: Proceeding of the seventh message understanding conference (MUC-7), Appendices, pp. 359–367, 1998.
[18] Chiticariu L., Krishnamurthy R., Li Y., Reiss F., Vaithyanathan S.: Domain adaptation of rule-based annotators for named-entity recognition tasks. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp. 1002–1012, 2010.
[19] Chiu J.P., Nichols E.: Named entity recognition with bidirectional LSTMCNNs, Transactions of the Association for Computational Linguistics, vol. 4, pp. 357–370, 2016.
[20] Chopra D., Jahan N., Morwal S.: Hindi named entity recognition by aggregating rule based heuristics and hidden markov model, International Journal of Information, vol. 2(6), pp. 43–52, 2012.
[21] Cucerzan S., Yarowsky D.: Language independent named entity recognition combining morphological and contextual evidence. In: 1999 joint SIGDAT conference on empirical methods in natural language processing and very large corpora, 1999.
[22] Ekbal A., Bandyopadhyay S.: Named Entity Recognition in Indian Languages Using Maximum Entropy Approach, International Journal for Computer Processing of Languages (IJCPOL), vol. 21(3), pp. 205–237, 2008. doi: 10.1142/ S1793840608001913.
[23] Ekbal A., Bandyopadhyay S.: A conditional random field approach for named entity recognition in Bengali and Hindi, Linguistic Issues in Language Technology, vol. 2(1), pp. 1–44, 2009.
[24] Ekbal A., Bandyopadhyay S.: Voted NER system using appropriate unlabeled data. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), pp. 202–210, 2009.
[25] Ekbal A., Bandyopadhyay S.: Named entity recognition using support vector machine: A language independent approach, International Journal of Electrical, Computer, and Systems Engineering, vol. 4(2), pp. 155–170, 2010.
[26] Ekbal A., Bandyopadhyay S.: Named entity recognition in Bengali and Hindi using support vector machine, Lingvisticæ Investigationes, vol. 34(1), pp 35–67, 2011.
[27] Ekbal A., Saha S.: Classifier ensemble selection using genetic algorithm for named entity recognition, Research on Language and Computation, vol. 8(1), pp. 73–99, 2010.
[28] Ekbal A., Saha S.: Weighted vote based classifier ensemble selection using genetic algorithm for named entity recognition. In: International Conference on Application of Natural Language to Information Systems, pp. 256–267, Springer, 2010.
[29] Ekbal A., Saha S.: A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in Indian languages as case studies, Expert Systems with Applications, vol. 38(12), pp. 14760–14772, 2011.
[30] Ekbal A., Saha S.: Weighted vote-based classifier ensemble for named entity recognition: a genetic algorithm-based approach, ACM Transactions on Asian Language Information Processing (TALIP), vol. 10(2), pp. 1–37, 2011.
[31] Ekbal A., Saha S.: Multiobjective optimization for classifier ensemble and feature selection: an application to named entity recognition, International Journal on Document Analysis and Recognition (IJDAR), vol. 15(2), pp. 143–166, 2012.
[32] Ekbal A., Haque R., Bandyopadhyay S.: Maximum Entropy Based Bengali Part of Speech Tagging, A. Gelbukh (ed.), Advances in Natural Language Processing and Applications, Research in Computing Science (RCS) Journal, vol. 33, pp. 67–78, 2008.
[33] Ekbal A., Saha S., Hasanuzzaman M.: Multiobjective approach for feature selection in maximum entropy based named entity recognition. In: 2010 22nd IEEE International Conference on Tools with Artificial Intelligence, vol. 1, pp. 323–326, IEEE, 2010.
[34] Ekbal A., Saha S., Sikdar U.K.: On active annotation for named entity recognition, International Journal of Machine Learning and Cybernetics, vol. 7(4), pp. 623–640, 2016.
[35] Ekbal A., Saha S., Singh D.: Active machine learning technique for named entity recognition. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, pp. 180–186, 2012.
[36] Ekbal A., Saha S., Singh D.: Ensemble based active annotation for named entity recognition. In: 2012 Third International Conference on Emerging Applications of Information Technology, pp. 331–334, IEEE, 2012.
[37] El-Halees A.M.: Arabic text classification using maximum entropy, IUG Journal of Natural Studies, vol. 15(1), 2015.
[38] Farmakiotou D., Karkaletsis V., Koutsias J., Sigletos G., Spyropoulos C.D., Stamatopoulos P.: Rule-based named entity recognition for Greek financial texts. In: Proceedings of the Workshop on Computational Lexicography and Multimedia Dictionaries (COMLEX 2000), pp. 75–78, Citeseer, 2000.
[39] Flood M., Grant J., Luo H., Raschid L., Soboroff I., Yoo K.: Financial entity identification and information integration (FEIII) challenge: the report of the organizing committee. In: Proceedings of the Second International Workshop on Data Science for Macro-Modeling, pp. 1–4, 2016.
[40] Fu R., Qin B., Liu T.: Generating Chinese named entity data from parallel corpora, Frontiers of Computer Science, vol. 8(4), pp. 629–641, 2014.
[41] Gali K., Surana H., Vaidya A., Shishtla P.M., Sharma D.M.: Aggregating machine learning and rule based heuristics for named entity recognition. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages, 2008.
[42] Gayen V., Sarkar K.: An HMM Based Named Entity Recognition System for Indian Languages: The JU System at ICON 2013, CoRR, vol. abs/1405.7397, 2014. http://arxiv.org/abs/1405.7397.
[43] Gella S., Sharma J., Bali K.: Query word labeling and Transliteration for Indian Languages: Shared task system description. In: Working Notes – Forum for Information Retrieval Evaluation (FIRE) 2013 Shared Task, 2013.
[44] Goodman J.: Sequential conditional generalized iterative scaling. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 9–16, 2002.
[45] Goyal A.: Named entity recognition for South Asian languages. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages, 2008.
[46] Guo H., Zhu H., Guo Z., Zhang X., Wu X., Su Z.: Domain adaptation with latent semantic association for named entity recognition. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 281–289, 2009.
[47] Gupta J., Tayal D.K., Gupta A.: A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language, Expert Systems with Applications, vol. 38(12), pp. 15084–15093, 2011.
[48] Gupta P.K., Arora S.: An approach for named entity recognition system for Hindi: an experimental study, Proceedings of ASCNT–2009, CDAC, Noida, India, pp. 103–108, 2009.
[49] Gupta S., Bhattacharyya P.: Think globally, apply locally: using distributional characteristics for Hindi named entity identification. In: Proceedings of the 2010 Named Entities Workshop, pp. 116–125, 2010.
[50] Hamdi A., Linhares Pontes E., Boros E., Nguyen T.T.H., Hackl G., Moreno J.G., Doucet A.: A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2328–2334, 2021.
[51] Han A.L.F., Zeng X., Wong D.F., Chao L.S.: Chinese named entity recognition with graph-based semi-supervised learning model. In: Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing, pp. 15–20, 2015.
[52] Han N.R., Chodorow M., Leacock C.: Detecting Errors in English Article Usage with a Maximum Entropy Classifier Trained on a Large, Diverse Corpus. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), 2004. http://www.lrec-conf.org/ proceedings/lrec2004/pdf/695.pdf.
[53] Han X., Kwoh C.K., Kim J.j.: Clustering based active learning for biomedical named entity recognition. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 1253–1260, IEEE, 2016.
[54] Hasanuzzaman M., Ekbal A., Bandyopadhyay S.: Maximum entropy approach for named entity recognition in Bengali and Hindi, International Journal of Recent Trends in Engineering, vol. 1(1), p. 408, 2009.
[55] Hasanuzzaman M., Saha S., Ekbal A.: Feature subset selection using genetic algorithm for named entity recognition. In: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pp. 153–162, 2010.
[56] Hayes B., Wilson C.: A maximum entropy model of phonotactics and phonotactic learning, Linguistic Inquiry, vol. 39(3), pp. 379–440, 2008.
[57] Hennig L., Truong P.T., Gabryszak A.: MobIE: A German Dataset for Named Entity Recognition, Entity Linking and Relation Extraction in the Mobility Domain, arXiv preprint arXiv:210806955, 2021.
[58] Ionescu B., Müller H., Villegas M., Arenas H., Boato G., Dang-Nguyen D.T., Cid Y.D., Eickhoff C., de Herrera A.G.S., Gurrin C., et al.: Overview of ImageCLEF 2017: Information extraction from images. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 315–337, Springer, 2017.
[59] Jain A.: Named Entity Recognition for Hindi Language Using NLP Techniques, Ph.D. Thesis. Jaypee Institute of Information Technology, Noida, Uttar Pradesh, India, 2019. http://hdl.handle.net/10603/241558.
[60] Jain A., Arora A.: Named Entity Recognition in Hindi Using Hyperspace Analogue to Language and Conditional Random Field, Pertanika Journal of Science & Technology, vol. 26(4), pp. 1801–1822, 2018.
[61] Jain A., Arora A.: Named entity system for tweets in Hindi language, International Journal of Intelligent Information Technologies (IJIIT), vol. 14(4), pp. 55–76, 2018.
[62] Jain A., Gairola R., Jain S., Arora A.: Thwarting Spam on Facebook: Identifying Spam Posts Using Machine Learning Techniques. In: Social Network Analytics for Contemporary Business Organizations, pp. 51–70, IGI Global, 2018.
[63] Jain A., Gupta A., Sharma N., Joshi S., Yadav D.: Mining application on analyzing users’ interests from Twitter. In: Proceedings of 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT), pp. 26–27, 2018.
[64] Jain A., Tayal D., Arora A.: OntoHindi NER – An ontology based novel approach for Hindi named entity recognition, International Journal of Artificial Intelligence (IJAI), vol. 16(2), pp. 106–135, 2018.
[65] Jain A., Tayal D.K., Yadav D., Arora A.: Research trends for named entity recognition in Hindi language. In: Data Visualization and Knowledge Engineering, pp. 223–248, Springer, 2020.
[66] Jain A., Tripathi S., Dwivedi H.D., Saxena P.: Forecasting price of cryptocurrencies using tweets sentiment analysis. In: 2018 Eleventh International Conference on Contemporary Computing (IC3), pp. 1–7, IEEE, 2018.
[67] Jain A., Yadav D., Tayal D.K.: NER for Hindi language using association rules. In: 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC), pp. 1–5, IEEE, 2014.
[68] Jayan J.P., Rajeev R., Sherly E.: A hybrid statistical approach for named entity recognition for Malayalam language. In: Proceedings of the 11th Workshop on Asian Language Resources, pp. 58–63, 2013.
[69] Kambhatla N.: Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 178–181, 2004.
[70] Kaur D., Gupta V.: A survey of named entity recognition in English and other Indian languages, International Journal of Computer Science Issues (IJCSI), vol. 7(6), pp. 239–245, 2010.
[71] Kaur Y., Kaur E.R.: Named Entity Recognition (NER) system for Hindi language using combination of rule based approach and list look up approach, International Journal of Scientific Research and Management (IJSRM), vol. 3(3), pp. 2300–2306, 2015.
[72] Kocaman V., Talby D.: Biomedical named entity recognition at scale. In: International Conference on Pattern Recognition, pp. 635–646, Springer, 2021.
[73] Kongburan W., Padungweang P., Krathu W., Chan J.H.: Metabolite Named Entity Recognition: A Hybrid Approach. In: International Conference on Neural Information Processing, pp. 451–460, Springer, 2016.
[74] Konkol M., Brychcín T., Konopík M.: Latent semantics in named entity recognition, Expert Systems with Applications, vol. 42(7), pp. 3470–3479, 2015.
[75] Kozareva Z., Bonev B., Montoyo A.: Self-training and co-training applied to Spanish named entity recognition. In: Mexican International conference on Artificial Intelligence, pp. 770–779, Springer, 2005.
[76] Krishnarao A.A., Gahlot H., Srinet A., Kushwaha D.S.: A comparison of performance of sequential learning algorithms on the task of named entity recognition for Indian languages. In: International Conference on Computational Science, pp. 123–132, Springer, 2009.
[77] Kumar N., Bhattacharyya P.: Named entity recognition in Hindi using MEMM (Technical Report), IIT Mumbai, 2006.
[78] Kumar N.K., Santosh G., Varma V.: A language-independent approach to identify the named entities in under-resourced languages and clustering multilingual documents. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 74–82, Springer, 2011.
[79] Lample G., Ballesteros M., Subramanian S., Kawakami K., Dyer C.: Neural architectures for named entity recognition, arXiv preprint arXiv:160301360, 2016.
[80] Leaman R., Lu Z.: TaggerOne: joint named entity recognition and normalization with semi-Markov Models, Bioinformatics, vol. 32(18), pp. 2839–2846, 2016.
[81] Li J., Sun A., Han J., Li C.: A survey on deep learning for named entity recognition, IEEE Transactions on Knowledge and Data Engineering, 2020.
[82] Li P., Wang M., Wang J.: Named entity translation method based on machine translation lexicon, Neural Computing and Applications, vol. 33(9), pp. 3977–3985, 2021.
[83] Li W., McCallum A.: Rapid development of Hindi named entity recognition using conditional random fields and feature induction, ACM Transactions on Asian Language Information Processing (TALIP), vol. 2(3), pp. 290–294, 2003.
[84] Lin S.B., Zhou D.X.: Distributed kernel-based gradient descent algorithms, Constructive Approximation, vol. 47(2), pp. 249–276, 2018.
[85] Liu S., Sun Y., Li B., Wang W., Zhao X.: HAMNER: Headword amplified multi-span distantly supervised method for domain specific named entity recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8401–8408, 2020.
[86] Meselhi M.A., Bakr H.M.A., Ziedan I., Shaalan K.: Hybrid named entity recognition – application to Arabic language. In: 2014 9th International Conference on Computer Engineering & Systems (ICCES), pp. 80–85, IEEE, 2014.
[87] Mora T., Walczak A.M., Bialek W., Callan C.G.: Maximum entropy models for antibody diversity, Proceedings of the National Academy of Sciences, vol. 107(12), pp. 5405–5410, 2010.
[88] Morwal S., Jahan N., Chopra D.: Named Entity Recognition using Hidden Markov Model (HMM), International Journal on Natural Language Computing (IJNLC), vol. 1(4), pp. 15–23, 2012. doi: 10.5121/ijnlc.2012.1402.
[89] Moussallem D., Wauer M., Ngomo A.C.N.: Machine translation using semantic web technologies: A survey, Journal of Web Semantics, vol. 51, pp. 1–19, 2018.
[90] Nakov P., Hoogeveen D., Màrquez L., Moschitti A., Mubarak H., Baldwin T., Verspoor K.: SemEval-2017 Task 3: Community Question Answering, arXiv preprint arXiv:191200730, 2019.
[91] Nanda M.: The Named Entity Recognizer Framework, International Journal of Innovative Research in Advanced Engineering, vol. 1(4), pp. 104–108, 2014.
[92] Nasar Z., Jaffry S.W., Malik M.K.: Named Entity Recognition and Relation Extraction: State of the Art, ACM Computing Surveys (CSUR), vol. 54(1), pp. 1–39, 2021. doi: 10.1145/3445965.
[93] Nayan A., Rao B.R.K., Singh P., Sanyal S., Sanyal R.: Named entity recognition for Indian languages. In: Proceedings of the IJCNLP-08 workshop on named entity recognition for South and South East Asian Languages, 2008.
[94] Neudecker C.: An open corpus for named entity recognition in historic newspapers. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 4348–4352, 2016.
[95] Nothman J., Ringland N., Radford W., Murphy T., Curran J.R.: Learning multilingual named entity recognition from Wikipedia, Artificial Intelligence, vol. 194, pp. 151–175, 2013.
[96] Osborne M.: Using maximum entropy for sentence extraction. In: Proceedings of the ACL-02 Workshop on Automatic Summarization, pp. 1–8, 2002.
[97] Pakhomov S.: Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 160–167, 2002.
[98] Patel A., Ramakrishnan G., Bhattacharya P.: Incorporating Linguistic Expertise Using ILP for Named Entity Recognition in Data Hungry Indian Languages. In: International Conference on Inductive Logic Programming, pp. 178–185, Springer, 2009.
[99] Patil N., Patil A.S., Pawar B.: Survey of named entity recognition systems with respect to Indian and foreign languages, International Journal of Computer Applications, vol. 134(16), 2016.
[100] Plu J., Rizzo G., Troncy R.: A hybrid approach for entity recognition and linking. In: Semantic Web Evaluation Challenges, pp. 28–39, Springer, 2015.
[101] Prakash H., Shambhavi B.R.: Approaches to Named Entity Recognition in Indian Languages: A Study, International Journal of Engineering and Advanced Technology (IJEAT), vol. 3(6), pp. 191–194, 2014.
[102] Praveen P., Ravi Kiran V.: Hybrid Named Entity Recognition System for South and South East Asian Languages. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages, 2008. https://aclanthology.org/I08-5012.
[103] Putthividhya D., Hu J.: Bootstrapped named entity recognition for product attribute extraction. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1557–1567, 2011.
[104] Quasthoff U., Biemann C., Wolff C.: Named entity learning and verification: expectation maximization in large corpora. In: COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002), 2002.
[105] Ratnaparkhi A., Reynar J., Roukos S.: A maximum entropy model for prepositional phrase attachment. In: Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8–11, 1994, 1994.
[106] Raychaudhuri S., Chang J.T., Sutphin P.D., Altman R.B.: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature, Genome Research, vol. 12(1), pp. 203–214, 2002.
[107] Saha S., Ekbal A.: Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition, Data & Knowledge Engineering, vol. 85, pp. 15–39, 2013.
[108] Saha S.K., Chatterji S., Dandapat S., Sarkar S., Mitra P.: A Hybrid Approach for Named Entity Recognition in Indian Languages. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp. 17–24, 2008.
[109] Saha S.K., Mitra P., Sarkar S.: Word clustering and word selection based feature reduction for MaxEnt based Hindi NER. In: Proceedings of ACL-08: HLT, pp. 488–495, 2008.
[110] Saha S.K., Mitra P., Sarkar S.: A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition, Knowledge-Based Systems, vol. 27, pp. 322–332, 2012.
[111] Saha S.K., Mitra P., Sarkar U.: A semi-supervised approach for maximum entropy based Hindi named entity recognition. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 225–230, Springer, 2009.
[112] Saha S.K., Narayan S., Sarkar S., Mitra P.: A composite kernel for named entity recognition, Pattern Recognition Letters, vol. 31(12), pp. 1591–1597, 2010.
[113] Saha S.K., Sarathi Ghosh P., Sarkar S., Mitra P.: Named entity recognition in Hindi using maximum entropy and transliteration, Polibits, (38), pp. 33–41, 2008.
[114] Saha S.K., Sarkar S., Mitra P.: Gazetteer preparation for named entity recognition in Indian languages. In: Proceedings of the 6th Workshop on Asian Language Resources, 2008.
[115] Saha S.K., Sarkar S., Mitra P.: A hybrid feature set based maximum entropy Hindi named entity recognition. In: Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I, 2008.
[116] Saha S.K., Sarkar S., Mitra P.: Feature selection techniques for maximum entropy based biomedical named entity recognition, Journal of Biomedical Informatics, vol. 42(5), pp. 905–911, 2009.
[117] Saha S.K., Sarkar S., Mitra P.: Hindi named entity annotation error detection and correction, Language Forum, vol. 35(2), pp. 73–93, 2009.
[118] Sahin H.B., Tirkaz C., Yildiz E., Eren M.T., Sonmez O.: Automatically annotated Turkish corpus for named entity recognition and text categorization using large-scale gazetteers, arXiv preprint arXiv:170202363, 2017.
[119] Sasidhar B., Yohan P., Babu A.V., Govardhan A.: A survey on named entity recognition in Indian languages with particular reference to Telugu, International Journal of Computer Science Issues, vol. 8(2), pp. 438–443, 2011.
[120] Shaalan K., Oudah M.: A hybrid approach to Arabic named entity recognition, Journal of Information Science, vol. 40(1), pp. 67–87, 2014.
[121] Sharma P., Sharma U., Kalita J.: Named entity recognition: A survey for the Indian languages, Parsing in Indian Languages, pp. 35–39, 2011.
[122] Sharma R., Morwal S., Agarwal B., Chandra R., Khan M.S.: A deep neural network-based model for named entity recognition for Hindi language, Neural Computing and Applications, vol. 32(20), pp. 16191–16203, 2020.
[123] Sharnagat R., Bhattacharyya P.: Hindi named entity recognizer for NER task of FIRE 2013, FIRE-2013, 2013.
[124] Shishtla P.M., Pingali P., Varma V.: A character n-gram based approach for improved recall in Indian language NER. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages, 2008.
[125] Sikdar U.K., Ekbal A., Saha S.: Differential evolution based feature selection and classifier ensemble for named entity recognition. In: Proceedings of COLING 2012, pp. 2475–2490, 2012.
[126] Singh A.K.: Named entity recognition for south and south east Asian languages: taking stock. In: Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and South East Asian Languages, 2008.
[127] Smyrnioudis N.: A Transformer-based Natural Language Processing Toolkit for Greek–Named Entity Recognition and Multi-task Learning, Bachelor Thesis. Athens University of Economics and Business, Greece, 2021. http://www2.aueb.gr/users/ion/docs/smyrnioudis_bsc_thesis.pdf.
[128] Speck R., Ngomo A.C.N.: Ensemble learning for named entity recognition. In: International Semantic Web Conference, pp. 519–534, Springer, 2014.
[129] Srivastava S., Sanglikar M., Kothari D.: Named entity recognition system for Hindi language: a hybrid approach, International Journal of Computational Linguistics (IJCL), vol. 2(1), pp. 10–23, 2011.
[130] Suárez-Paniagua V., Dong H., Casey A.: A multi-BERT hybrid system for named entity recognition in spanish radiology reports, CLEF eHealth, 2021.
[131] Szarvas G., Farkas R., Kocsor A.: A Multilingual Named Entity Recognition System Using Boosting and C4.5 Decision Tree Learning Algorithms. In: International Conference on Discovery Science, pp. 267–278, Springer, 2006.
[132] Tanabe L., Xie N., Thom L.H., Matten W., Wilbur W.J.: GENETAG: a tagged corpus for gene/protein named entity recognition, BMC Bioinformatics, vol. 6(1), pp. 1–7, 2005.
[133] Thomas M., Latha C.: Sentimental analysis of transliterated text in Malayalam using recurrent neural networks, Journal of Ambient Intelligence and Humanized Computing, vol. 12(6), pp. 6773–6780, 2021.
[134] Uchimoto K., Sekine S., Isahara H.: The unknown word problem: a morphological analysis of Japanese using maximum entropy aided by a dictionary. In: Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, 2001.
[135] Wang X., Yang C., Guan R.: A comparative study for biomedical named entity recognition, International Journal of Machine Learning and Cybernetics, vol. 9(3), pp. 373–382, 2018.
[136] Wang Y., Wang L., Rastegar-Mojarad M., Moon S., Shen F., Afzal N., Liu S., Zeng Y., Mehrabi S., Sohn S., Liu H.: Clinical information extraction applications: A literature review, Journal of Biomedical Informatics, vol. 77, pp. 34–49, 2018. doi: 10.1016/j.jbi.2017.11.011.
[137] Xiong D., Liu Q., Lin S.: Maximum entropy based phrase reordering model for statistical machine translation. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 521–528, ACL, 2006.
[138] Yadav V., Bethard S.: A Survey on Recent Advances in Named Entity Recognition from Deep Learning models, arXiv preprint arXiv:191011470, 2019. doi: 10.48550/ARXIV.1910.11470.
[139] Yaseen U., Langer S.: Neural Text Classification and Stacked Heterogeneous Embeddings for Named Entity Recognition in SMM4H 2021, arXiv preprint arXiv:210605823, 2021. doi: 10.48550/ARXIV.2106.05823.
[140] Zhao L., Li L., Zheng X., Zhang J.: A BERT based Sentiment Analysis and Key Entity Detection Approach for Online Financial Texts. In: 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 1233–1238, IEEE, 2021. doi: 10.1109/CSCWD49262. 2021.9437616.
[141] Zhou Y., Ju C., Caufield J.H., Shih K., Chen C., Sun Y., Chang K.W., Ping P., Wang W.: Clinical Named Entity Recognition using Contextualized Token Representations, CoRR, vol. abs/2106.12608, 2021. https://arxiv.org/abs/2106. 12608.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-6423e839-4335-471c-890b-b74fdadf407a