PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Impact of n-stage latent Dirichlet allocation on analysis of headline classification

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Data analysis becomes difficult when the amount of the data increases. More specifically, extracting meaningful insights from this vast amount of data and grouping it based on its shared features without human intervention requires advanced methodologies. There are topic-modeling methods that help overcome this problem in text analyses for downstream tasks (such as sentiment analysis, spam detection, and news classification). In this research, we benchmark several classifiers (namely, random forest, AdaBoost, naive Bayes, and logistic regression) using the classical latent Dirichlet allocation (LDA) and n-stage LDA topic-modeling methods for feature extraction in headline classification. We ran our experiments on three and five classes of publicly available Turkish and English datasets. We have demonstrated that, as a feature extractor, n-stage LDA obtains state-of-the-art performance for any downstream classifier. It should also be noted that random forest was the most successful algorithm for both datasets.
Wydawca
Czasopismo
Rocznik
Tom
Strony
375--394
Opis fizyczny
Bibliogr. 31 poz., rys., tab.
Twórcy
  • Ege University, Turkey
autor
  • Yildiz Technical University, Turkey
  • Walmart Global Tech, USA
Bibliografia
  • [1] Akhtar N., Zubair N., Kumar A., Ahmad T.: Aspect based Sentiment Oriented Summarization of Hotel Reviews, Procedia Computer Science, vol. 115, pp. 563–571, 2017. doi: 10.1016/j.procs.2017.09.115.
  • [2] Atzeni M., Dridi A., Reforgiato Recupero D.: Fine-Grained Sentiment Analysis on Financial Microblogs and News Headlines. In: Semantic Web Challenges. SemWebEval 2017, pp. 124–128, Communications in Computer and Information Science, vol. 769, Springer, Cham, 2017. doi: 10.1007/978-3-319-69146-6 11.
  • [3] Bastani K., Namavari H., Shaffer J.: Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints, Expert Systems with Applications, vol. 127, pp. 256–271, 2019. doi: 10.1016/j.eswa.2019.03.001.
  • [4] Blei D.M., Ng A.Y., Jordan M.I.: Latent Dirichlet allocation, Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.
  • [5] Bonte C., Vercauteren F.: Privacy-preserving logistic regression training, BMC Medical Genomics, vol. 11, 2018. doi: 10.1186/s12920-018-0398-y.
  • [6] Chen H., Gilad-Bachrach R., Han K., Huang Z., Jalali A., Laine K., Lauter K.: Logistic regression over encrypted data from fully homomorphic encryption, BMC Medical Genomics, vol. 11, 2018. doi: 10.1186/s12920-018-0397-z.
  • [7] Chen S., Webb G.I., Liu L., Ma X.: A novel selective na¨ıve Bayes algorithm, Knowledge-Based Systems, vol. 192, 2020. doi: 10.1016/j.knosys.2019.105361.
  • [8] Darst B.F., Malecki K.C., Engelman C.D.: Using recursive feature elimination in random forest to account for correlated variables in high dimensional data, BMC Genetics, vol. 19, 2018. doi: 10.1186/s12863-018-0633-8.
  • [9] Du Y.J., Yi Y.T., Li X.Y., Chen X.L., Fan Y.Q., Su F.H.: Extracting and tracking hot topics of micro-blogs based on improved Latent Dirichlet Allocation, Engineering Applications of Artificial Intelligence, vol. 87, 2020. doi: 10.1016/j. engappai.2019.103279.
  • [10] Ferreira-Mello R., Andre M., Pinheiro A., Costa E., Romero C.: Text mining in education, WIREs Data Mining and Knowledge Discovery, vol. 9(6), 2019. doi: 10.1002/widm.1332.
  • [11] Garcıa-Pablos A., Cuadros M., Rigau G.: W2VLDA: Almost unsupervised system for Aspect Based Sentiment Analysis, Expert Systems with Applications, vol. 91, pp. 127–137, 2018. doi: 10.1016/j.eswa.2017.08.049.
  • [12] Guven Z.A., Diri B., Cakaloglu T.: n-stage Latent Dirichlet Allocation: A Novel Approach for LDA, CoRR, 2021. doi: 10.48550/arXiv.2110.08591.
  • [13] Guven Z.A., Diri B., C¸akaloglu T.: Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets, Academic Platform Journal of Engineering and Science, vol. 7(3), pp. 467–472, 2019. doi: 10.21541/apjes.459447.
  • [14] Guven Z.A., Diri B., Cakaloglu T.: Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis, Journal of the Faculty of Engineering and Architecture of Gazi University, vol. 35(4), pp. 2135–2146, 2020. doi: 10.17341/gazimmfd.556104.
  • [15] Hoffman M.D., Blei D.M., Bach F.: Online Learning for Latent Dirichlet Allocation. In: NIPS’10: Proceedings of the 23rd International Conference on Neural Information Processing Systems – Volume 1, pp. 856–864, 2010.
  • [16] Jelodar H., Wang Y., Yuan C., Feng X., Jiang X., Li Y., Zhao L.: Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey, Multimedia Tools and Applications, vol. 78, pp. 15169–15211, 2019. doi: 10.1007/ s11042-018-6894-4.
  • [17] Kim J.H., Mantrach A., Jaimes A., Oh A.: How to Compete Online for News Audience: Modeling Words that Attract Clicks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1645–1654, 2016. doi: 10.1145/2939672.2939873.
  • [18] Li C., Duan Y., Wang H., Zhang Z., Sun A., Ma Z.: Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings, ACM Transactions on Information Systems, vol. 36(2), pp. 1–30, 2017. doi: 10.1145/3091108.
  • [19] Li J., Li G., Liu M., Zhu X., Wei L.: A novel text-based framework for forecasting agricultural futures using massive online news headlines, International Journal of Forecasting, vol. 38(1), pp. 35–50, 2020. doi: 10.1016/j.ijforecast.2020.02.002.
  • [20] Li Z., White J.C., Wulder M.A., Hermosilla T., Davidson A.M., Comber A.J.: Land cover harmonization using Latent Dirichlet Allocation, International Journal of Geographical Information Science, vol. 35(2), pp. 348–374, 2021. doi: 10. 1080/13658816.2020.1796131.
  • [21] Liu S., Guo L., Mays K., Betke M., Wijaya D.T.: Detecting Frames in News Headlines and Its Application to Analyzing News Framing Trends Surrounding U.S. Gun Violence. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 504–514, 2019. doi: 10.18653/v1/k19-1047.
  • [22] Lu Z., Liu W., Zhou Y., Hu X., Wang B.: An Effective Approach for Chinese News Headline Classification Based on Multi-representation Mixed Model with Attention and Ensemble Learning. In: Natural Language Processing and Chinese Computing. NLPCC 2017, Lecture Notes in Computer Science, vol. 10619, pp. 339–350, Springer, Cham, 2018. doi: 10.1007/978-3-319-73618-1 29.
  • [23] Omidvar A., Pourmodheji H., An A., Edall G.: Learning to Determine the Quality of News Headlines. In: Proceedings of the 12th International Conference on Agents and Artificial Intelligence – Volume 1: NLPinAI, pp. 401–409, 2020. doi: 10.5220/0009367504010409.
  • [24] Probst P., Boulesteix A.L.: To Tune or Not to Tune the Number of Trees in Random Forest, Journal of Machine Learning Research, vol. 18, pp. 1–18, 2018.
  • [25] Seifollahi S., Shajari M.: Word sense disambiguation application in sentiment analysis of news headlines: an applied approach to FOREX market prediction, Journal of Intelligent Information Systems, vol. 52, pp. 57–83, 2019. doi: 10.1007/s10844-018-0504-9.
  • [26] Sommeria-Klein G., Zinger L., Coissac E., Iribar A., Schimann H., Taberlet P., Chave J.: Latent Dirichlet Allocation reveals spatial and taxonomic structure in a DNA-based census of soil biodiversity from a tropical forest, Molecular Ecology Resources, vol. 20(2), pp. 371–386, 2020. doi: 10.1111/1755-0998.13109.
  • [27] Stevens K., Kegelmeyer P., Andrzejewski D., Buttler D.: Exploring Topic Coherence over Many Models and Many Topics. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 952––961, 2012. https://aclanthology. org/D12-1087.
  • [28] Syed S., Spruit M.: Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 165–174, 2017. doi: 10.1109/ DSAA.2017.61.
  • [29] Wang W., Feng Y., Dai W.: Topic analysis of online reviews for two competitive products using latent Dirichlet allocation, Electronic Commerce Research and Applications, vol. 29, pp. 142–156, 2018. doi: 10.1016/j.elerap.2018.04.003.
  • [30] Xiao L., Dong Y., Dong Y.: An improved combination approach based on Adaboost algorithm for wind speed time series forecasting, Energy Conversion and Management, vol. 160, pp. 273–288, 2018. doi: 10.1016/j.enconman. 2018.01.038.
  • [31] Xing W., Lee H.S., Shibani A.: Identifying patterns in students’ scientific argumentation: content analysis through text mining using Latent Dirichlet Allocation, Educational Technology Research and Development, vol. 68, pp. 2185–2214, 2020. doi: 10.1007/s11423-020-09761-w.
Uwagi
PL
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-6765a2b0-92d3-4bdf-b35c-1df64a6b6fa2
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.