Background: Continuous modifications, suboptimal software design practices, and stringent project deadlines contribute to the proliferation of code smells. Detecting and refactoring these code smells are pivotal to maintaining complex and essential software systems. Neglecting them may lead to future software defects, rendering systems challenging to maintain, and eventually obsolete. Supervised machine learning techniques have emerged as valuable tools for classifying code smells without needing expert knowledge or fixed threshold values. Further enhancement of classifier performance can be achieved through effective feature selection techniques and the optimization of hyperparameter values. Aim: Performance measures of multiple machine learning classifiers are improved by fine tuning its hyperparameters using various type of meta-heuristic algorithms including swarm intelligent, physics, math, and bio-based etc. Their performance measures are compared to find the best meta-heuristic algorithm in the context of code smell detection and its impact is evaluated based on statistical tests. Method: This study employs sixteen contemporary and robust meta-heuristic algorithms to optimize the hyperparameters of two machine learning algorithms: Support Vector Machine (SVM) and k-nearest Neighbors (k-NN). The No Free Lunch theorem underscores that the success of an optimization algorithm in one application may not necessarily extend to others. Consequently, a rigorous comparative analysis of these algorithms is undertaken to identify the best-fit solutions for code smell detection. A diverse range of optimization algorithms, encompassing Arithmetic, Jellyfish Search, Flow Direction, Student Psychology Based, Pathfinder, Sine Cosine, Jaya, Crow Search, Dragonfly, Krill Herd, Multi-Verse, Symbiotic Organisms Search, Flower Pollination, Teaching Learning Based, Gravitational Search, and Biogeography-Based Optimization, have been implemented. Results: In the case of optimized SVM, the highest attained accuracy, AUC, and F-measure values are 98.75%, 100%, and 98.57%, respectively. Remarkably, significant increases in accuracy and AUC, reaching 32.22% and 45.11% respectively, are observed. For k-NN, the best accuracy, AUC, and F-measure values are all perfect at 100%, with noteworthy hikes in accuracy and ROC-AUC values, amounting to 43.89% and 40.83%, respectively. Conclusion: Optimized SVM exhibits exceptional performance with the Sine Cosine Optimization algorithm, while k-NN attains its peak performance with the Flower Optimization algorithm. Statistical analysis underscores the substantial impact of employing meta-heuristic algorithms for optimizing machine learning classifiers, enhancing their performance significantly. Optimized SVM excels in detecting the God Class, while optimized k-NN is particularly effective in identifying the Data Class. This innovative fusion automates the tuning process and elevates classifier performance, simultaneously addressing multiple longstanding challenges.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Code smell is a risky code pattern impacting code maintenance. Some of the code smells are defined by metrics (e.g., lines of code). Unfortunately, it is not clear how to set these thresholds for them. Goal: To propose a smell description language that allows querying code repositories to empirically determine impact of metric thresholds on severity of smells. Method: We propose a language, called McPython, that allows defining metric-based smells. We evaluate the expressiveness of the language by specifying some popular code smells. Results: McPython is a functional domain-specific language that allows defining smells as parameterized logical propositions with auxiliary functions. McPython code is translated to Python and executed on object-oriented representation of a code repository. Its current version is capable of expressing 7 code smells. Conclusion: Despite its limitations, McPython has the potential to help in investigating the impact of code smell parameters on their severity.
Introduction: Successive code changes during the maintenance phase may cause the emergence of bad smells and anti-patterns in code and gradually results in deterioration of the code and difficulties in its maintainability. Continuous Quality Control (QC) is essential in this phase to refactor the anti-patterns and bad smells. Objectives: The objective of this research has been to present a novel component called Code Deterioration Watch (CDW) to be integrated with existing Issue Tracking Systems (ITS) in order to assist the QC team in locating the software modules most vulnerable to deterioration swiftly. The important point regarding the CDW is the fact that its function has to be independent of the code level metrics rather it is totally based on issue level metrics measured from ITS repositories. Methods: An issue level metric that properly alerts us of bad-smell emergence was identified by mining software repositories. To measure that metric, a Stream Clustering algorithm called ReportChainer was proposed to spot Relatively Long Chains (RLC) of incoming issue reports as they tell the QC team that a concentrated point of successive changes has emerged in the software. Results: The contribution of this paper is partly creating a huge integrated code and issue repository of twelve medium and large size open-source software products from Apache and Eclipse. By mining this repository it was observed that there is a strong direct correlation (0.73 on average) between the number of issues of type "New Feature" reported on a software package and the number of bad-smells of types "design" and "error prone" emerged in that package. Besides a strong direct correlation (0.97 on average) was observed between the length of a chain and the magnitude of times it caused changes to a software package. Conclusion: The existence of direct correlation between the number of issues of type "New Feature" reported on a software package and (1) the number of bad-smells of types "design" and "error prone" and (2) the value of "CyclomaticComplexity" metric of the package, justifies the idea of Quality Control merely based on issue-level metrics. A stream clustering algorithm can be effectively applied to alert the emergence of a deteriorated module.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.