PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Using Word Embeddings for Italian Crime News Categorization

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (16 ; 02-05.09.2021 ; online)
Języki publikacji
EN
Abstrakty
EN
Several studies have shown that the use of embeddings improves outcomes in many NLP activities, including text categorization. In this paper, we focus on how word embeddings can be used on newspaper articles about crimes to categorize them according to the type of crime they report. Our approach was tested on an Italian dataset of 15,361 crime news articles combining different Word2Vec models and exploiting supervised and unsupervised Machine Learning categorization algorithms. The tests show very promising results.
Rocznik
Tom
Strony
461--470
Opis fizyczny
Bibliogr. 26 poz., il., tab.
Twórcy
  • Enzo Ferrari Engineering Department, University of Modena and Reggio Emilia, Italy
  • Enzo Ferrari Engineering Department, University of Modena and Reggio Emilia, Italy
autor
  • Enzo Ferrari Engineering Department, University of Modena and Reggio Emilia, Italy
Bibliografia
  • 1. S. Ghankutkar, N. Sarkar, P. Gajbhiye, S. Yadav, D. Kalbande, and N. Bakereywala, “Modelling machine learning for analysing crime news,” in 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), 2019, pp. 1-5. [Online]. Available: https://doi.org/10.1109/ICAC347590.2019.9036769
  • 2. M. Hassan and M. Z. Rahman, “Crime news analysis: Location and story detection,” in 2017 20th International Conference of Computer and Information Technology (ICCIT), 2017, pp. 1-6. [Online]. Available: https://doi.org/10.1109/ICCITECHN.2017.8281798
  • 3. D. Velásquez, S. Medina, G. Yamada, P. Lavado, M. Núñez, H. Alatrista, and J. Morzan, “I read the news today, oh boy: The effect of crime news coverage on crime perception and trust,” Institute of Labor Economics (IZA), IZA Discussion Papers 12056, Dec. 2018. [Online]. Available: https://ideas.repec.org/p/iza/izadps/dp12056.html
  • 4. D. Ghosh, S. A. Chun, B. Shafiq, and N. R. Adam, “Big data-based smart city platform: Real-time crime analysis,” in Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, DG.O 2016, Shanghai, China, June 08 - 10, 2016, Y. Kim and S. M. Liu, Eds. ACM, 2016, pp. 58-66. [Online]. Available: https://doi.org/10.1145/2912160.2912205
  • 5. S. K and P. S. Thilagam, “Crime base: Towards building a knowledge base for crime entities and their relationships from online newspapers,” Information Processing & Management, vol. 56, no. 6, p. 102059, 2019. [Online]. Available: https://doi.org/10.1016/j.ipm.2019.102059
  • 6. L. Po and F. Rollo, “Building an urban theft map by analyzing newspaper crime reports,” in 2018 13th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), 2018, pp. 13-18. [Online]. Available: https://doi.org/10.1109/SMAP.2018.8501866
  • 7. T. Dasgupta, A. Naskar, R. Saha, and L. Dey, “Crimeprofiler: Crime information extraction and visualization from news media,” in Proceedings of the International Conference on Web Intelligence, ser. WI ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 541-549. [Online]. Available: https://doi.org/10.1145/3106426.3106476
  • 8. F. Rollo and L. Po, “Crime event localization and deduplication,” in The Semantic Web - ISWC 2020, J. Z. Pan, V. Tamma, C. d’Amato, K. Janowicz, B. Fu, A. Polleres, O. Seneviratne, and L. Kagal, Eds. Cham: Springer International Publishing, 2020, pp. 361-377. [Online]. Available: https://doi.org/10.1007/978-3-030-62466-8_23
  • 9. L. Po, F. Rollo, and R. T. Lado, “Topic detection in multichannel italian newspapers,” in Semantic Keyword-Based Search on Structured Data Sources - COST Action IC1302 Second International KEYSTONE Conference, IKC 2016, Cluj-Napoca, Romania, September 8-9, 2016, Revised Selected Papers, ser. Lecture Notes in Computer Science, A. Calì, D. Gorgan, and M. Ugarte, Eds., vol. 10151, 2016, pp. 62-75. [Online]. Available: https://doi.org/10.1007/978-3-319-53640-8_6
  • 10. F. Rollo, “A key-entity graph for clustering multichannel news: student research abstract,” in Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017, A. Seffah, B. Penzenstadler, C. Alves, and X. Peng, Eds. ACM, 2017, pp. 699-700. [Online]. Available: https://doi.org/10.1145/3019612.3019930
  • 11. S. Bergamaschi, L. Po, and S. Sorrentino, “Comparing topic models for a movie recommendation system,” in WEBIST 2014 - Proceedings of the 10th International Conference on Web Information Systems and Technologies, Volume 2, Barcelona, Spain, 3-5 April, 2014, V. Monfort and K. Krempels, Eds. SciTePress, 2014, pp. 172-183. [Online]. Available: https://doi.org/10.5220/0004835601720183
  • 12. L. Po and D. Malvezzi, “Community detection applied on big linked data,” J. Univers. Comput. Sci., vol. 24, no. 11, pp. 1627-1650, 2018. [Online]. Available: http://www.jucs.org/jucs_24_11/community_detection_applied_on
  • 13. C. Wang, P. Nulty, and D. Lillis, “A comparative study on word embeddings in deep learning for text classification,” in Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, ser. NLPIR 2020. New York, NY, USA: Association for Computing Machinery, 2020, p. 37-46. [Online]. Available: https://doi.org/10.1145/3443279.3443304
  • 14. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2013. [Online]. Available: http://arxiv.org/abs/1301.3781
  • 15. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” Transactions of the Association for Computational Linguistics, vol. 5, 07 2016. [Online]. Available: https://doi.org/10.1162/tacl_a_00051
  • 16. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, A. Moschitti, B. Pang, and W. Daelemans, Eds. ACL, 2014, pp. 1532-1543. [Online]. Available: https://doi.org/10.3115/v1/d14-1162
  • 17. A. Moreo, A. Esuli, and F. Sebastiani, “Word-class embeddings for multiclass text classification,” Data Min. Knowl. Discov., vol. 35, no. 3, pp. 911- 963, 2021. [Online]. Available: https://doi.org/10.1007/s10618-020-00735-3
  • 18. A. Fesseha, S. Xiong, E. D. Emiru, M. Diallo, and A. Dahou, “Text classification based on convolutional neural networks and word embedding for low-resource languages: Tigrinya,” Inf., vol. 12, no. 2, p. 52, 2021. [Online]. Available: https://doi.org/10.3390/info12020052
  • 19. A. Borg, M. Boldt, O. Rosander, and J. Ahlstrand, “E-mail classification with machine learning and word embeddings for improved customer support,” Neural Comput. Appl., vol. 33, no. 6, pp. 1881-1902, 2021. [Online]. Available: https://doi.org/10.1007/s00521-020-05058-4
  • 20. E. Christodoulou, A. Gregoriades, M. Pampaka, and H. Herodotou, “Application of classification and word embedding techniques to evaluate tourists’ hotel-revisit intention,” in Proceedings of the 23rd International Conference on Enterprise Information Systems, ICEIS 2021, Online Streaming, April 26-28, 2021, Volume 1, J. Filipe, M. Smialek, A. Brodsky, and S. Hammoudi, Eds. SCITEPRESS, 2021, pp. 216-223. [Online]. Available: https://doi.org/10.5220/0010453502160223
  • 21. P. Semberecki and H. Maciejewski, “Deep learning methods for subject text classification of articles,” in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017, Prague, Czech Republic, September 3-6, 2017, ser. Annals of Computer Science and Information Systems, M. Ganzha, L. A. Maciaszek, and M. Paprzycki, Eds., vol. 11, 2017, pp. 357-360. [Online]. Available: https://doi.org/10.15439/2017F414
  • 22. T. Lin, “Performance of different word embeddings on text classification,” https://towardsdatascience.com/nlp-performance-of-different-word-embeddings-on-text-classification-de648c6262b, 2019, accessed: 7 June 2021.
  • 23. J. Lilleberg, Y. Zhu, and Y. Zhang, “Support vector machines and word2vec for text classification with semantic features,” in 14th IEEE International Conference on Cognitive Informatics & Cognitive Computing, ICCI*CC 2015, Beijing, China, July 6-8, 2015, N. Ge, J. Lu, Y. Wang, N. Howard, P. Chen, X. Tao, B. Zhang, and L. A. Zadeh, Eds. IEEE Computer Society, 2015, pp. 136-140. [Online]. Available: https://doi.org/10.1109/ICCI-CC.2015.7259377
  • 24. G. Di Gennaro, A. Buonanno, A. Di Girolamo, A. Ospedale, F. A. N. Palmieri, and G. Fedele, An Analysis of Word2Vec for the Italian Language. Singapore: Springer Singapore, 2021, pp. 137-146. [Online]. Available: https://doi.org/10.1007/978-981-15-5093-5_13
  • 25. B. Li, A. Drozd, Y. Guo, T. Liu, S. Matsuoka, and X. Du, “Scaling word2vec on big corpus,” Data Sci. Eng., vol. 4, no. 2, pp. 157-175, 2019. [Online]. Available: https://doi.org/10.1007/s41019-019-0096-6
  • 26. K. W. Bowyer, N. V. Chawla, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” CoRR, vol. abs/1106.1813, 2011. [Online]. Available: https://doi.org/10.1613/jair.953
Uwagi
1. Track 3: Advances in Information Systems and Technology
2. Session: 27th Conference on Knowledge Acquisition and Management
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-58774571-da5a-4181-a9b7-e12be1ce6202
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.