Wyniki wyszukiwania - BazTech

1

Software Deterioration Control Based on Issue Reports

Bushehrian Omid, Sayari Mohsen, Shamsinejad Pirooz

e-Informatica Software Engineering Journal

|

2021

|

Vol. 15, nr 1

115--132

EN

Introduction: Successive code changes during the maintenance phase may cause the emergence of bad smells and anti-patterns in code and gradually results in deterioration of the code and difficulties in its maintainability. Continuous Quality Control (QC) is essential in this phase to refactor the anti-patterns and bad smells. Objectives: The objective of this research has been to present a novel component called Code Deterioration Watch (CDW) to be integrated with existing Issue Tracking Systems (ITS) in order to assist the QC team in locating the software modules most vulnerable to deterioration swiftly. The important point regarding the CDW is the fact that its function has to be independent of the code level metrics rather it is totally based on issue level metrics measured from ITS repositories. Methods: An issue level metric that properly alerts us of bad-smell emergence was identified by mining software repositories. To measure that metric, a Stream Clustering algorithm called ReportChainer was proposed to spot Relatively Long Chains (RLC) of incoming issue reports as they tell the QC team that a concentrated point of successive changes has emerged in the software. Results: The contribution of this paper is partly creating a huge integrated code and issue repository of twelve medium and large size open-source software products from Apache and Eclipse. By mining this repository it was observed that there is a strong direct correlation (0.73 on average) between the number of issues of type "New Feature" reported on a software package and the number of bad-smells of types "design" and "error prone" emerged in that package. Besides a strong direct correlation (0.97 on average) was observed between the length of a chain and the magnitude of times it caused changes to a software package. Conclusion: The existence of direct correlation between the number of issues of type "New Feature" reported on a software package and (1) the number of bad-smells of types "design" and "error prone" and (2) the value of "CyclomaticComplexity" metric of the package, justifies the idea of Quality Control merely based on issue-level metrics. A stream clustering algorithm can be effectively applied to alert the emergence of a deteriorated module.

2

A case study in text mining of discussion forum posts: Classification with bag of words and global vectors

Cichosz P.

International Journal of Applied Mathematics and Computer Science

|

2018

|

Vol. 28, no. 4

787--801

EN

Despite the rapid growth of other types of social media, Internet discussion forums remain a highly popular communication channel and a useful source of text data for analyzing user interests and sentiments. Being suited to richer, deeper, and longer discussions than microblogging services, they particularly well reflect topics of long-term, persisting involvement and areas of specialized knowledge or experience. Discovering and characterizing such topics and areas by text mining algorithms is therefore an interesting and useful research direction. This work presents a case study in which selected classification algorithms are applied to posts from a Polish discussion forum devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana. The utility of two different vector text representations is examined: the simple bag of words representation and the more refined embedded global vectors one. While the former is found to work well for the multinomial naive Bayes algorithm, the latter turns out more useful for other classification algorithms: logistic regression, SVMs, and random forests. The obtained results suggest that post-classification can be applied for measuring publication intensity of particular topics and, in the case of forums related to psychoactive substances, for monitoring the risk of drug-related crime.

3

Formalization of Technological Knowledge in the Field of Metallurgy Using Document Classification Tools Supported with Semantic Techniques

Regulski K.

Archives of Metallurgy and Materials

|

2017

|

Vol. 62, iss. 2A

715--720

EN

The process of knowledge formalization is an essential part of decision support systems development. Creating a technological knowledge base in the field of metallurgy encountered problems in acquisition and codifying reusable computer artifacts based on text documents. The aim of the work was to adapt the algorithms for classification of documents and to develop a method of semantic integration of a created repository. Author used artificial intelligence tools: latent semantic indexing, rough sets, association rules learning and ontologies as a tool for integration. The developed methodology allowed for the creation of semantic knowledge base on the basis of documents in natural language in the field of metallurgy.

4

Efficient Kernel Discriminative Geometry Preserving Projection for Document Classification

Wang Z., Sun X., Qian X.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 5b

56-59

EN

A new dimensionality reduction algorithm called kernel discriminative geometry preserving projection (KDGPP) is proposed to cope with document classification. By considering both intraclass geometry and interclass discrimination, KDGPP can not only nonlinearly project documents into lower-dimensional feature space via manifold adaptive kernel function but also reduce the computational complexity with Nyström method. Experimental results demonstrate that KDGPP outperforms other related algorithms in terms of effectiveness and efficiency.

PL

Zaproponowano nowy algorytm do klasyfikacji dokumentów nazwany KDGPP – kernel discriminative geometry preserving projection. Algorytm redukuje złożoność obliczeń numerycznych.

5

Metody zwiększające precyzyjność wyszukiwania informacji-automatyczna kategoryzacja

Kozak K.

Studia Informatica

|

2004

|

Vol. 25, nr 3

71-81

PL

Niniejszy artykuł poświęcony jest metodom organizacji dokumentów uzyskanych w wyniku wyszukiwania i ich zastosowania w aplikacjach. Opisana metoda polega na automatycznej kategoryzacji dokumentów,a przykład zastsowania tej metodyw systemach zawierających w swoim repozytorium dokumenty pochodzi z dziedziny medycyny - psychiatrii.

EN

In this article arę described methods and their implementation to the systems for organize documents from search results. Below is described approach that automatically categorize documents. This approach was implemented to systems which contain and arę repository for scientific documents from domain medicine -psychiatry.