In recent times, software developers widely use instant messaging and collaboration platforms, as these platforms aid them in exploring new technologies, raising different development-related issues, and seeking solutions from their peers virtually. Gitter is one such platform that has a heavy userbase. It generates a tremendous volume of data, analysis of which is helpful to gain insights about trends in open-source software development and the developers' inclination toward various technologies. The classification techniques can be deployed for this purpose. The selection of an apt word embedding for a given dataset of text messages plays a vital role in determining the performance of classification techniques. In the present work, the comparative analysis of nine-word embeddings in combination with seventeen classification techniques with onevsone and onevsrest has been performed on the GitterCom dataset for categorizing text messages into one of the pre-determined classes based on their purpose. Further, two feature selection methods have been applied. The SMOTE technique has been used for handling data imbalance. It resulted in a total of 612 classification pipelines for analysis. The experimental results show that word2vect, GLOVE with 300 vector size, and GLOVE with 100 vector size are three top-performing word embeddings having performance values taken across different classification techniques. The models trained using ANOVA features performed similarly to those models trained using all features. Finally, using the SMOTE technique helps models to get a better prediction ability.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Sentiment analysis for the software engineering community helps to find important information for various tasks, including the suggestion to improve code quality, defect-related comments for source code, possibilities for improvement etc. The manual finding of sentiment-based comments may be an inaccurate prediction and a time-consuming process. The automation of the sentiment analysis process by leveraging Machine Learning models can benefit software professionals by giving them other developers insights and feelings about software products, libraries, development, and maintenance tasks at a glance. This study aims to develop software sentiment prediction models based on comments by (1) identifying the best embedding techniques to represent the word of the comments, not just as a number but as a vector in n-dimensional space (2) finding the best sets of vectors using different features selection techniques (3) finding best methods to handle class imbalance nature of the data, and (4) finding best architecture of deep-learning for the training of models. The developed models are validated using 5-fold cross-validation with four different performance parameters: accuracy, AUC, recall, and precision on three different datasets. The experimental finding shows that the models developed using the word embedding with feature selection using Deep Learning classifiers on balanced data can significantly predict the underlying sentiments of textual comments.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Software requirement classification is becoming increasingly crucial for the industry to keep up with the demand of growing project sizes. Based on client feedback or demand, software requirement classification is critical in segregating user needs into functional and quality requirements. However, because there are numerous machine learning (ML) and deep-learning (DL) models that require parameter tuning, the use of ML to facilitate decision-making across the software engineering pipeline is not well understood. Five distinct word embedding techniques were applied to the functional and quality software requirements in this study. The imbalanced classes in the dataset are balanced using SMOTE. Then, to reduce duplicate and unnecessary features, feature selection and dimensionality reduction techniques are used. Dimensionality reduction is accomplished with Principal Component Analysis (PCA), while feature selection is accomplished with the Rank-Sum Test (RST). For binary categorization into functional and non-functional needs, the generated vectors are provided as inputs to eight distinct Deep Learning classifiers. The findings of the research show that using a combination of word embedding and feature selection techniques in conjunction with various classifiers can accurately classify functional and quality software requirements.
4
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
This work aims to develop defect severity level prediction models that have the ability to assign severity level of defects based on bugs report. In this work, seven different word embedding techniques are applied to defect description to represent the word, not just as a number but as a vector in n-dimensional space. Further, three feature selection techniques have been applied to find the right set of relevant vectors. The effectiveness of these word embedding techniques and different sets of vectors are evaluated using different classification techniques with SMOTE to overcome the class imbalance problem.
Background: Q&A websites such as StackOverflow or Serverfault provide an open platform for users to ask questions and to get help from experts present worldwide. These websites not only help users by answering their questions but also act as a knowledge base. These data present on these websites can be mined to extract valuable information that can benefit the software practitioners. Software engineering research community has already understood the potential benefits of mining data from Q&A websites and several research studies have already been conducted in this area. Aim: The aim of the study presented in this paper is to perform an empirical analysis of logging questions from six popular Q&A websites. Method: We perform statistical, programming language and content analysis of logging questions. Our analysis helped us to gain insight about the logging discussion happening in six different domains of the StackExchange websites. Results: Our analysis provides insight about the logging issues of software practitioners: logging questions are pervasive in all the Q&A websites, the mean time to get accepted answer for logging questions on SU and SF websites are much higher as compared to other websites, a large number of logging question invite a great amount of discussion in the SoftwareEngineering Q&A website, most of the logging issues occur in C++ and Java, the trend for number of logging questions is increasing for Java, Python, and Javascript, whereas, it is decreasing or constant for C, C++, C#, for the ServerFault and Superuser website 'C' is the dominant programming language.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.