PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Comparative Analysis of Word Embedding and Machine Learning Techniques for Classification of Software Developer Communications on Gitter

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In recent times, software developers widely use instant messaging and collaboration platforms, as these platforms aid them in exploring new technologies, raising different development-related issues, and seeking solutions from their peers virtually. Gitter is one such platform that has a heavy userbase. It generates a tremendous volume of data, analysis of which is helpful to gain insights about trends in open-source software development and the developers' inclination toward various technologies. The classification techniques can be deployed for this purpose. The selection of an apt word embedding for a given dataset of text messages plays a vital role in determining the performance of classification techniques. In the present work, the comparative analysis of nine-word embeddings in combination with seventeen classification techniques with onevsone and onevsrest has been performed on the GitterCom dataset for categorizing text messages into one of the pre-determined classes based on their purpose. Further, two feature selection methods have been applied. The SMOTE technique has been used for handling data imbalance. It resulted in a total of 612 classification pipelines for analysis. The experimental results show that word2vect, GLOVE with 300 vector size, and GLOVE with 100 vector size are three top-performing word embeddings having performance values taken across different classification techniques. The models trained using ANOVA features performed similarly to those models trained using all features. Finally, using the SMOTE technique helps models to get a better prediction ability.
Rocznik
Tom
Strony
335--346
Opis fizyczny
Bibliogr. 12 poz., tab., il.
Twórcy
autor
  • Department of Computer Science & Information Systems BITS Pilani Hyderabad Campus
autor
  • Department of Computer Engineering National Institute of Technology, Kurukshetra
autor
  • Department of Computer Engineering National Institute of Technology, Kurukshetra
  • Department of Computer Science & Information Systems BITS Pilani Hyderabad Campus
Bibliografia
  • 1. E. Parra, A. Ellis, and S. Haiduc, “Gittercom: A dataset of open source developer communications in gitter,” in Proceedings of the 17th International Conference on Mining Software Repositories, 2020, pp. 563–567.
  • 2. O. Ehsan, S. Hassan, M. E. Mezouar, and Y. Zou, “An empirical study of developer discussions in the gitter platform,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 30, no. 1, pp. 1–39, 2020.
  • 3. H. Sahar, A. Hindle, and C.-P. Bezemer, “How are issue reports discussed in gitter chat rooms?” Journal of Systems and Software, vol. 172, p. 110852, 2021.
  • 4. E. Parra, M. Alahmadi, A. Ellis, and S. Haiduc, “A comparative study and analysis of developer communications on slack and gitter,” Empirical Software Engineering, vol. 27, no. 2, p. 40, 2022.
  • 5. B. Lin, A. Zagalsky, M.-A. Storey, and A. Serebrenik, “Why developers are slacking off: Understanding how software teams use slack,” in Proceedings of the 19th acm conference on computer supported cooperative work and social computing companion, 2016, pp. 333–336.
  • 6. V. Stray, N. B. Moe, and M. Noroozi, “Slack me if you can! using enterprise social networking tools in virtual agile teams,” in 2019 ACM/IEEE 14th International Conference on Global Software Engineering (ICGSE). IEEE, 2019, pp. 111–121.
  • 7. R. Alkadhi, T. Lata, E. Guzmany, and B. Bruegge, “Rationale in development chat messages: an exploratory study,” in 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 2017, pp. 436–446.
  • 8. R. Alkadhi, J. O. Johanssen, E. Guzman, and B. Bruegge, “React: An approach for capturing rationale in chat messages,” in 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 2017, pp. 175–180.
  • 9. S. Beyer, C. Macho, M. Pinzger, and M. Di Penta, “Automatically classifying posts into question categories on stack overflow,” in Proceedings of the 26th Conference on Program Comprehension, 2018, pp. 211–221.
  • 10. A. S. M. Venigalla, C. Lakkundi, and S. Chimalakonda, “Sotagger - towards classifying stack overflow posts through contextual tagging (s),” 07 2019, pp. 493–496.
  • 11. E. Guzman, M. Ibrahim, and M. Glinz, “A little bird told me: Mining tweets for requirements and software evolution,” in 2017 IEEE 25th International Requirements Engineering Conference (RE). IEEE, 2017, pp. 11–20.
  • 12. S. Tiun, U. Mokhtar, S. Bakar, and S. Saad, “Classification of functional and non-functional requirement in software requirement using word2vec and fast text,” in journal of Physics: conference series, vol. 1529, no. 4. IOP Publishing, 2020, p. 042077.
Uwagi
1. This research is funded by TestAIng Solutions Pvt. Ltd.
2. Thematic Tracks Regular Papers
3. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-905b012b-444d-4835-9f20-bdf7ee00c913
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.