Wyniki wyszukiwania - BazTech

1

Continuous update of business process trees using Continuous Inductive Miner

Pawlak Tomasz P., Górka Bartosz

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2023

|

Vol. 71, nr 1

art. no. e143551

EN

Business processes are omnipresent in nowadays economy: companies operate repetitively to achieve their goals, e.g., deliver goods, complete orders. The business process model is the key to understanding, managing, controlling, and verifying the operations of a company. Modeling of business processes may be a legal requirement in some market segments, e.g., financial in the European Union, and a prerequisite for certification, e.g., of the ISO-9001 standard. However, business processes naturally evolve, and continuous model adaptation is essential for rapid spot and reaction to changes in the process. The main contribution of this work is the Continuous Inductive Miner (CIM) algorithm that discovers and continuously adapts the process tree, an established representation of the process model, using the batches of event logs of the business process. CIM joins the exclusive guarantees of its two batch predecessors, the Inductive Miner (IM) and the Inductive Miner – directlyfollows-based (IMd): perfectly fit and sound models, and single-pass event log processing, respectively. CIM offers much shorter computation times in the update scenario than IM and IMd. CIM employs statistical information to work around the need to remember event logs as IM does while ensuring the perfect fit, contrary to IMd.

2

Regression function and noise variance tracking methods for data streams with concept drift

Jaworski M.

International Journal of Applied Mathematics and Computer Science

|

2018

|

Vol. 28, no. 3

559--567

EN

Two types of heuristic estimators based on Parzen kernels are presented. They are able to estimate the regression function in an incremental manner. The estimators apply two techniques commonly used in concept-drifting data streams, i.e., the forgetting factor and the sliding window. The methods are applicable for models in which both the function and the noise variance change over time. Although nonparametric methods based on Parzen kernels were previously successfully applied in the literature to online regression function estimation, the problem of estimating the variance of noise was generally neglected. It is sometimes of profound interest to know the variance of the signal considered, e.g., in economics, but it can also be used for determining confidence intervals in the estimation of the regression function, as well as while evaluating the goodness of fit and in controlling the amount of smoothing. The present paper addresses this issue. Specifically, variance estimators are proposed which are able to deal with concept drifting data by applying a sliding window and a forgetting factor, respectively. A number of conducted numerical experiments proved that the proposed methods perform satisfactorily well in estimating both the regression function and the variance of the noise.

3

Coupling as Strategy for Reducing Concept-Drift in Never-ending Learning Environments

Hruschka Jr E. R., Duarte M. C., Nicoletti M. C.

Fundamenta Informaticae

|

2013

|

Vol. 124, nr 1/2

47--61

EN

The project and implementation of autonomous computational systems that incrementally learn and use what has been learnt to, continually, refine its learning abilities throughout time is still a goal far from being achieved. Such dynamic systems would conform to the main ideas of the automatic learning model conventionally characterized as never-ending learning (NEL). The never-ending approach to learning exhibits similarities to the semi-supervised (SS) model which has been successfully implemented by bootstrap learning methods. Bootstrap learning has been one of the most successful among the SS-methods proposed to date and, as such, the natural candidate for implementing NEL systems. Bootstrap methods learn from an available labeled set of data, use the induced knowledge to label some unlabeled new data and, recurrently, learn again from both sets of data in a cyclic manner. However the use of SS methods, particularly bootstrapping methods, to implement NEL systems can give rise to a problem known as concept-drift. Errors that may occur when the system automatically labels new unlabeled data can, over time, cause the system to run off track. The development of new strategies to lessen the impact of concept-drift is an important issue that should be addressed if the goal is to increase the plausibility of developing such systems, employing bootstrap methods. Coupling techniques can play an important role in reducing concept-drift effects over machine learning systems, particularly those designed to perform tasks related to machine reading. This paper proposes and formalizes relevant coupling strategies for dealing with the concept-drift problem in a NEL environment implemented as the system RTWP (Read The Web in Portuguese); initial results have shown they are promising strategies for minimizing the problem taking into account a few system settings.

4

Incremental rule-based learners for handling concept drift: an overview

Deckert M

Foundations of Computing and Decision Sciences

|

2013

|

Vol. 38, No. 1

35--65

EN

Learning from non-stationary environments is a very popular research topic. There already exist algorithms that deal with the concept drift problem. Among them there are online or incremental learners, which process data instance by instance. Their knowledge representation can take different forms such as decision rules, which have not received enough attention in learning with concept drift. This paper reviews incremental rule-based learners designed for changing environments. It describes four of the proposed algorithms: FLORA, AQ11-PM+WAH, FACIL and VFDR. Those four solutions can be compared on several criteria, like: type of processed data, adjustment to changes, type of the maintained memory, knowledge representation, and others.

5

Unsupervised labeling of data for supervised learning and its application to medical claims prediction

Ngufor C., Wojtusiak A.

Computer Science

|

2013

|

Vol. 14 (2)

191--214

EN

The task identifying changes and irregularities in medical insurance claim payments is a difficult process of which the traditional practice involves querying historical claims databases and flagging potential claims as normal or abnormal. Because what is considered as normal payment is usually unknown and may change over time, abnormal payments often pass undetected; only to be discovered when the payment period has passed. This paper presents the problem of on-line unsupervised learning from data streams when the distribution that generates the data changes or drifts over time. Automated algorithms for detecting drifting concepts in a probability distribution of the data are presented. The idea behind the presented drift detection methods is to transform the distribution of the data within a sliding window into a more convenient distribution. Then, a test statistics p-value at a given significance level can be used to infer the drift rate, adjust the window size and decide on the status of the drift. The detected concepts drifts are used to label the data, for subsequent learning of classification models by a supervised learner. The algorithms were tested on several synthetic and real medical claims data sets.

6

Semi-supervised approach to handle sudden concept drift in Enron data

Kmieciak M. R., Stefanowski J.

Control and Cybernetics

|

2011

|

Vol. 40, no 3

667-695

EN

Detection of concept changes in incremental learning from data streams and classifier adaptation is studied in this paper. It is often assumed that all processed learning examples are always labeled, i.e. the class label is available for each example. As it may be difficult to satisfy this assumption in practice, in particular in case of data streams, we introduce an approach that detects concept drift in unlabeled data and retrains the classifier using a limited number of additionally labeled examples. The usefulness of this partly supervised approach is evaluated in the experimental study with the Enron data. This real life data set concerns classification of user's emails to multiple folders. Firstly, we show that the Enron data are characterized by frequent sudden changes of concepts. We also demonstrate that our approach can precisely detect these changes. Results of the next comparative study demonstrate that our approach leads to the classification accuracy comparable to two fully supervised methods: the periodic retraining of the classifier based on windowing and the trigger approach with the DDM supervised drift detection. However, our approach reduces the number of examples to be labeled. Furthermore, it requires less updates of retraining classifiers than windowing.

7

Solving Support Vector Machine with Many Examples

Białoń P.

Journal of Telecommunications and Information Technology

|

2010

|

nr 3

65-70

EN

Various methods of dealing with linear support vector machine (SVM) problems with a large number of examples are presented and compared. The author believes that some interesting conclusions from this critical analysis applies to many new optimization problems and indicates in which direction the science of optimization will branch in the future. This direction is driven by the automatic collection of large data to be analyzed, and is most visible in telecommunications. A stream SVM approach is proposed, in which the data substantially exceeds the available fast random access memory (RAM) due to a large number of examples. Formally, the use of RAM is constant in the number of examples (though usually it depends on the dimensionality of the examples space). It builds an inexact polynomial model of the problem. Another author's approach is exact. It also uses a constant amount of RAM but also auxiliary disk files, that can be long but are smartly accessed. This approach bases on the cutting plane method, similarly as Joachims' method (which, however, relies on early finishing the optimization).