Computer-aided breast ultrasound (BUS) diagnosis remains a difficult task. One of the challenges is that imbalanced BUS datasets lead to poor performance, especially with regard to low accuracy in the minority (malignant tumor) class. Missed diagnosis of malignant tumors can cause serious consequences, such as delaying treatment and increasing the risk of death. Moreover, many diagnosis methods do not consider classification reliability; thus, some classifications may have a large uncertainty. To resolve such problems, a bounded-abstaining classification model is proposed. It maximizes the area under the ROC curve (AUC) under two abstention constraints. A total of 219 (92 malignant and 127 benign) BUS images are collected from the First Affiliated Hospital of Harbin Medical University, China. The experiment tests BUS datasets of three imbalance levels, and the performance contours are analyzed. The results demonstrate that AUC-rejection curves are less affected by class imbalance than accuracy-rejection curves. Compared with the state-of-the-art, the proposed method yields a significantly larger AUC and G-mean using imbalanced BUS datasets.
The authors consider the problem of fraud detection at self-checkouts in retail in condition of unbalanced data set. A new ensemble-based method is proposed for its effective solution. The developed method involves two main steps: application of the preprocessing procedures and the Random Forest algorithm. The step-by-step implementation of the preprocessing stage involves the sequential execution of such procedures over the input data: scaling by maximal element in a column with row-wise scaling by Euclidean norm, weighting by correlation and applying polynomial extension. For polynomial extension Ito decomposition of the second degree is used. The simulation of the method was carried out on real data. Evaluating performance was based on the use of cost matrix. The experimental comparison of the effectiveness of the developed ensemble-based method with a number of existing (simples and ensembles) demonstrates the best performance of the developed method. Experimental studies of changing the parameters of the Random Forest both for the basic algorithm and for the developed method demonstrate a significant improvement of the investigated efficiency measures of the latter. It is the result of all steps of the preprocessing stage of the developed method use.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.