Wyniki wyszukiwania - BazTech

Ograniczanie wyników

2 Annals of Computer Science and Information Systems

2 2024

Znaleziono wyników: 2

Liczba wyników na stronie

Wyniki wyszukiwania

Sortuj według:

Ogranicz wyniki do:

Topic Modeling of the SrpELTeC Corpus: A Comparison of NMF, LDA, and BERTopic

Mihajlov Teodora, Ikonić Nešić Milica, Stanković Ranka, Kitanović Olivera

Annals of Computer Science and Information Systems

2024

Vol. 39

649--653

Topic modeling is an effective way to gain insight into large amounts of data. Some of the most widely used topic models are Latent Dirichlet allocation (LDA) and Nonnegative Matrix Factorization (NMF). However, with the rise of self- attention models and pre-trained language models, new ways to mine topics have emerged. BERTopic represents the current state-of-the-art when it comes to modeling topics. In this paper, we comapred LDA, NMF, and BERTopic performance on literary texts in Serbian, by measuring Topic Coherency and Topic Diveristy, as well as qualitatively evaluating the topics. For BERTopic, we compared multilingual sentence transformer embeddings, to the Jerteh-355 monolingual embeddings for Serbian. For TC, NMF yielded the best results, while BERTopic with Jerteh-355 embeddings gave the best TD. Jerteh-355 also outperformed sentence transformers embeddings in both TC and TD.

SrpCNNeL: Serbian Model for Named Entity Linking

Ikonić Nešić Milica, Petalinkar Saša, Stanković Ranka, Utvić Miloš, Kitanović Olivera

Annals of Computer Science and Information Systems

2024

Vol. 39

465--473

This paper presents the development of a Named Entity Linking (NEL) model to the Wikidata knowledge base for the Serbian language named SrpCNNeL. The model was trained to recognize and link seven different named entity types (persons, locations, organisations, professions, events, demonyms, and works of art) on the dataset containing sentences from novels, legal documents, as also sentences generated from the Wikidata knowledge base and Leximirka lexical database. The resulting model demonstrated robust performance, achieving an F1 score of 0.8 on the test set. Considering that the dataset contains the highest number of locations linked to the knowledge base, an evaluation was conducted on an independent dataset and compared to the baseline Spacy Entity Linker for locations only.