Wyniki wyszukiwania - BazTech

1

Joint feature selection and classification for positive unlabelled multi-label data using weighted penalized empirical risk minimization

Teisseyre Paweł

International Journal of Applied Mathematics and Computer Science

|

2022

|

Vol. 32, no. 2

311--322

EN

We consider the positive-unlabelled multi-label scenario in which multiple target variables are not observed directly. Instead, we observe surrogate variables indicating whether or not the target variables are labelled. The presence of a label means that the corresponding variable is positive. The absence of the label means that the variable can be either positive or negative. We analyze embedded feature selection methods based on two weighted penalized empirical risk minimization frameworks. In the first approach, we introduce weights of observations. The idea is to assign larger weights to observations for which there is a consistency between the values of the true target variable and the corresponding surrogate variable. In the second approach, we consider a weighted empirical risk function which corresponds to the risk function for the true unobserved target variables. The weights in both the methods depend on the unknown propensity score functions, whose estimation is a challenging problem. We propose to use very simple bounds for the propensity score, which leads to relatively simple forms of weights. In the experiments we analyze the predictive power of the methods considered for different labelling schemes.

2

A Deep Learning Approach with Stack of Sub-classifiers for Multi-label Classification of Obstructive Disease from Myocardial Perfusion SPECT

Trieu Ninh Ngan, Phung Nhu Hai, Nguyen Chi Thanh, Nguyen Thanh Trung

Annals of Computer Science and Information Systems

|

2022

|

Vol. 33

261--266

EN

Artificial intelligence applications, especially deep learning in medical imaging, have gained much attention in recent years. With the computer's aid, Coronary artery disease (CAD) - one of the most dangerous cardiovascular diseases - is diagnosed effectively without human interference and efforts. A lot of research involving predicting CAD from Myocardial Perfusion SPECT has been conducted and given impressive results. However, all existing methods detect whether there is a disease or not. They do not provide information about which obstructive areas are (mainly in the left anterior descending artery (LAD), left circumflex artery (LCx), and right coronary artery (RCA) territories) that result in CAD. To further diagnose CAD, we develop new classifiers to solve a multi-label classification problem with the highest accuracy and area under the receiver operating characteristics curve (AUC) when compared to different methods. Our proposed method is based on transfer learning to extract features from Myocardial Perfusion SPECT Polar Maps and a novel stack of sub-classifiers to detect particularly obstructive areas. We evaluated our methods with eight hundred and one obstructive images from a database of patients referred to a hospital from 2017 to 2019.

3

Effective multi-label classification method with applications to text document categorization

Glinka K., Zakrzewska D.

Information Systems in Management

|

2016

|

Vol. 5, No. 1

24--35

EN

Increasing number of repositories of online documents resulted in growing demand for automatic categorization algorithms. However, in many cases the texts should be assigned to more than one class. In the paper, new multi-label classification algorithm for short documents is considered. The presented problem transformation Labels Chain (LC) algorithm is based on relationship between labels, and consecutively uses result labels as new attributes in the following classification process. The method is validated by experiments conducted on several real text datasets of restaurant reviews, with different number of instances, taking into account such classifiers as kNN, Naive Bayes, SVM and C4.5. The obtained results showed the good performance of the LC method, comparing to the problem transformation methods like Binary Relevance and Label Powerset.

4

Limiting Data Exposure in Multi-Label Classification Processes

Anciaux N., Boutara D., Nguyen B., Vazirgiannis M.

Fundamenta Informaticae

|

2015

|

Vol. 137, nr 2

219--236

EN

Administrative services such social care, tax reduction, and many others using complex decision processes, request individuals to provide large amounts of private data items, in order to calibrate their proposal to the specific situation of the applicant. This data is subsequently processed and stored by the organization. However, all the requested information is not needed to reach the same decision. We have recently proposed an approach, termed Minimum Exposure, to reduce the quantity of information provided by the users, in order to protect her privacy, reduce processing costs for the organization, and financial lost in the case of a data breach. In this paper, we address the case of decision making processes based on sets of classifiers, typically multi-label classifiers. We propose a practical implementation using state of the art multi-label classifiers, and analyze the effectiveness of our solution on several real multi-label data sets.

5

Multi-label classification using error correcting output codes

Kajdanowicz T., Kazienko P.

International Journal of Applied Mathematics and Computer Science

|

2012

|

Vol. 22, no. 4

829-840

EN

A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.

6

Ensemble of incremental multilabel classifiers for tag recommending in social networks

Chojnacki S., Kłopotek M.

Studia Informatica

|

2010

|

Vol. 31, nr 2A

9-19

EN

The purpose of this article is to present profits and costs of enriching state of the art real life tag recommender system with incremental learning mechanisms. We describe modifications to a system that successfully participated in Online Task of ECML/PKDD Discovery Challenge 2009. The system's architecture follows an idea to construct hierarchical ensemble of simple classifiers, which was implemented in various ways by the systems with highest performance in the Challenge. The system is currently integrated as a web service with BibSonomy bookmarking portal and outperforms other algorithms in terms of effective latency. We focus on incremental learning techniques that improve quality of the system's recommendations, but do not raise maintainability, efficiency or reliability issues.

PL

Celem artykułu jest prezentacja korzyści i zagrożeń związanych ze wzbogaceniem systemu rekomendującego otagowania o mechanizmy uczenia przyrostowego. Opisujemy modyfikacje do systemu, który z powodzeniem brał udział w konkursie ECML/PKDD Discovery Challenge. Architektura systemu oparta jest na idei wspólnej dla systemów, które osiągnęły najwyższe oceny podczas konkursu i składa się z hierarchicznego łączenia wyników prostych klasyfikatorów. Opisywany system jest obecnie zintegrowany z serwisem internetowym BibSonomy i osiąga najwyższe oceny ze względu na czas dostarczania rekomendacji dla użytkowników. W tym artykule skupiamy uwagę na zastosowaniu technik uczenia przyrostowego, które nie powodują znacznego obniżenia prędkości systemu lub obniżają jego niezawodność we wdrożeniu w środowisku badawczym.