In recent times, software developers widely use instant messaging and collaboration platforms, as these platforms aid them in exploring new technologies, raising different development-related issues, and seeking solutions from their peers virtually. Gitter is one such platform that has a heavy userbase. It generates a tremendous volume of data, analysis of which is helpful to gain insights about trends in open-source software development and the developers' inclination toward various technologies. The classification techniques can be deployed for this purpose. The selection of an apt word embedding for a given dataset of text messages plays a vital role in determining the performance of classification techniques. In the present work, the comparative analysis of nine-word embeddings in combination with seventeen classification techniques with onevsone and onevsrest has been performed on the GitterCom dataset for categorizing text messages into one of the pre-determined classes based on their purpose. Further, two feature selection methods have been applied. The SMOTE technique has been used for handling data imbalance. It resulted in a total of 612 classification pipelines for analysis. The experimental results show that word2vect, GLOVE with 300 vector size, and GLOVE with 100 vector size are three top-performing word embeddings having performance values taken across different classification techniques. The models trained using ANOVA features performed similarly to those models trained using all features. Finally, using the SMOTE technique helps models to get a better prediction ability.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Sentiment analysis for the software engineering community helps to find important information for various tasks, including the suggestion to improve code quality, defect-related comments for source code, possibilities for improvement etc. The manual finding of sentiment-based comments may be an inaccurate prediction and a time-consuming process. The automation of the sentiment analysis process by leveraging Machine Learning models can benefit software professionals by giving them other developers insights and feelings about software products, libraries, development, and maintenance tasks at a glance. This study aims to develop software sentiment prediction models based on comments by (1) identifying the best embedding techniques to represent the word of the comments, not just as a number but as a vector in n-dimensional space (2) finding the best sets of vectors using different features selection techniques (3) finding best methods to handle class imbalance nature of the data, and (4) finding best architecture of deep-learning for the training of models. The developed models are validated using 5-fold cross-validation with four different performance parameters: accuracy, AUC, recall, and precision on three different datasets. The experimental finding shows that the models developed using the word embedding with feature selection using Deep Learning classifiers on balanced data can significantly predict the underlying sentiments of textual comments.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Software requirement classification is becoming increasingly crucial for the industry to keep up with the demand of growing project sizes. Based on client feedback or demand, software requirement classification is critical in segregating user needs into functional and quality requirements. However, because there are numerous machine learning (ML) and deep-learning (DL) models that require parameter tuning, the use of ML to facilitate decision-making across the software engineering pipeline is not well understood. Five distinct word embedding techniques were applied to the functional and quality software requirements in this study. The imbalanced classes in the dataset are balanced using SMOTE. Then, to reduce duplicate and unnecessary features, feature selection and dimensionality reduction techniques are used. Dimensionality reduction is accomplished with Principal Component Analysis (PCA), while feature selection is accomplished with the Rank-Sum Test (RST). For binary categorization into functional and non-functional needs, the generated vectors are provided as inputs to eight distinct Deep Learning classifiers. The findings of the research show that using a combination of word embedding and feature selection techniques in conjunction with various classifiers can accurately classify functional and quality software requirements.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.