Comparative Analysis of Word Embedding and Machine Learning Techniques for Classification of Software Developer Communications on Gitter

Akshar, Tumu; Kumar, Lov; Yogita; Murthy, Lalita Bhanu

doi:10.15439/2023F6950

Artykuł - szczegóły

Tytuł artykułu

Comparative Analysis of Word Embedding and Machine Learning Techniques for Classification of Software Developer Communications on Gitter

Autorzy

Akshar Tumu , Kumar Lov , Yogita , Murthy Lalita Bhanu

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2023F6950

Warianty tytułu

Języki publikacji

Abstrakty

In recent times, software developers widely use instant messaging and collaboration platforms, as these platforms aid them in exploring new technologies, raising different development-related issues, and seeking solutions from their peers virtually. Gitter is one such platform that has a heavy userbase. It generates a tremendous volume of data, analysis of which is helpful to gain insights about trends in open-source software development and the developers' inclination toward various technologies. The classification techniques can be deployed for this purpose. The selection of an apt word embedding for a given dataset of text messages plays a vital role in determining the performance of classification techniques. In the present work, the comparative analysis of nine-word embeddings in combination with seventeen classification techniques with onevsone and onevsrest has been performed on the GitterCom dataset for categorizing text messages into one of the pre-determined classes based on their purpose. Further, two feature selection methods have been applied. The SMOTE technique has been used for handling data imbalance. It resulted in a total of 612 classification pipelines for analysis. The experimental results show that word2vect, GLOVE with 300 vector size, and GLOVE with 100 vector size are three top-performing word embeddings having performance values taken across different classification techniques. The models trained using ANOVA features performed similarly to those models trained using all features. Finally, using the SMOTE technique helps models to get a better prediction ability.

Słowa kluczowe

functional requirements non-functional requirements deep learning data imbalance methods feature selection classification techniques word embedding

wymagania funkcjonalne głębokie uczenie metody niezbalansowanych danych wybór cech techniki klasyfikacji osadzanie słów

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2023

Tom

Vol. 35

Strony

335--346

Opis fizyczny

Bibliogr. 12 poz., tab., il.

Twórcy

autor

Akshar Tumu

f20200003@hyderabad.bits-pilani.ac.in

Department of Computer Science & Information Systems BITS Pilani Hyderabad Campus

autor

Kumar Lov

lovkumar@nitkkr.ac.in

Department of Computer Engineering National Institute of Technology, Kurukshetra

autor

Yogita

yogita@nitkkr.ac.in

Department of Computer Engineering National Institute of Technology, Kurukshetra

autor

Murthy Lalita Bhanu

bhanu@hyderabad.bits-pilani.ac.in

Department of Computer Science & Information Systems BITS Pilani Hyderabad Campus

Bibliografia

1. E. Parra, A. Ellis, and S. Haiduc, “Gittercom: A dataset of open source developer communications in gitter,” in Proceedings of the 17th International Conference on Mining Software Repositories, 2020, pp. 563–567.
2. O. Ehsan, S. Hassan, M. E. Mezouar, and Y. Zou, “An empirical study of developer discussions in the gitter platform,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 30, no. 1, pp. 1–39, 2020.
3. H. Sahar, A. Hindle, and C.-P. Bezemer, “How are issue reports discussed in gitter chat rooms?” Journal of Systems and Software, vol. 172, p. 110852, 2021.
4. E. Parra, M. Alahmadi, A. Ellis, and S. Haiduc, “A comparative study and analysis of developer communications on slack and gitter,” Empirical Software Engineering, vol. 27, no. 2, p. 40, 2022.
5. B. Lin, A. Zagalsky, M.-A. Storey, and A. Serebrenik, “Why developers are slacking off: Understanding how software teams use slack,” in Proceedings of the 19th acm conference on computer supported cooperative work and social computing companion, 2016, pp. 333–336.
6. V. Stray, N. B. Moe, and M. Noroozi, “Slack me if you can! using enterprise social networking tools in virtual agile teams,” in 2019 ACM/IEEE 14th International Conference on Global Software Engineering (ICGSE). IEEE, 2019, pp. 111–121.
7. R. Alkadhi, T. Lata, E. Guzmany, and B. Bruegge, “Rationale in development chat messages: an exploratory study,” in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 2017, pp. 436–446.
8. R. Alkadhi, J. O. Johanssen, E. Guzman, and B. Bruegge, “React: An approach for capturing rationale in chat messages,” in 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 2017, pp. 175–180.
9. S. Beyer, C. Macho, M. Pinzger, and M. Di Penta, “Automatically classifying posts into question categories on stack overflow,” in Proceedings of the 26th Conference on Program Comprehension, 2018, pp. 211–221.
10. A. S. M. Venigalla, C. Lakkundi, and S. Chimalakonda, “Sotagger - towards classifying stack overflow posts through contextual tagging (s),” 07 2019, pp. 493–496.
11. E. Guzman, M. Ibrahim, and M. Glinz, “A little bird told me: Mining tweets for requirements and software evolution,” in 2017 IEEE 25th International Requirements Engineering Conference (RE). IEEE, 2017, pp. 11–20.
12. S. Tiun, U. Mokhtar, S. Bakar, and S. Saad, “Classification of functional and non-functional requirement in software requirement using word2vec and fast text,” in journal of Physics: conference series, vol. 1529, no. 4. IOP Publishing, 2020, p. 042077.

Uwagi

1. This research is funded by TestAIng Solutions Pvt. Ltd.

2. Thematic Tracks Regular Papers

3. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-905b012b-444d-4835-9f20-bdf7ee00c913