Wyniki wyszukiwania - BazTech

1

Metoda klasyfikacji danych na podstawie modelu sekwencyjnej dyskretyzacji

Jankowski C.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2017

|

nr 4

102--106

PL

Klasyczny schemat eksploracji danych z nadzorem zawiera etap klasyfikacji, poprzedzony wstępnym przetwarzaniem danych. Dyskretyzacja danych numerycznych stanowi ważny element przetwarzania wstępnego. Klasyczne podejście nie zapewnia wykorzystania wiedzy zdobytej podczas dyskretyzacji danych w etapie klasyfikacji. Prowadzi to do zwiększenia zasobów potrzebnych do obliczeń. W artykule przedstawiono nowatorską metodę klasyfikacji danych na podstawie modelu sekwencyjnej dyskretyzacji. Opisano założenia i kroki algorytmu, przedstawiono przykłady, ilustrujące działanie metody w zależności od wybranych parametrów, a także wyniki przeprowadzonych eksperymentów.

EN

The classic scheme of supervised data mining includes the step of classification preceded by data preprocessing. Numeric data discretization is an important part of the preprocessing. The classic approach does not allow to use the knowledge gained in the discretization stage in the following classification. This leads to an increase of resources needed for calculations. The paper proposes the novel method of data classification based on the model of sequential discretization. The assumptions and steps of the proposed algorithm have been described. The examples showing how method results change depending on the parameters' values have been presented. The paper contains the results of conducted experiments.

2

Zastosowanie eksploracji danych w telekomunikacji

Jankowski C., Mańkowski M., Zbierzchowski B.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2014

|

nr 10

1329--1334

PL

Rozwój sieci komputerowych oraz teleinformatyki umożliwił zdobywanie wielkiej ilości danych. Istotna jest jednak wiedza z ich pomocą zdobywana. Jest to możliwe dzięki zastosowaniu eksploracji danych. Przedstawiono podstawowy podział metod eksploracji danych oraz rozliczne ich zastosowania w telekomunikacji. Wśród przykładów można miedzy innymi wyróżnić klasyczny problem filtracji wiadomości elektronicznych, należący do szerszej rodziny wykrywania zdarzeń niepożądanych czy segmentację rynku na potrzeby marketingowe.

EN

The modern development of computer networks and teleinformatics has enabled the acquisition of great amounts of data. However, the discovered knowledge is important. This is possible through the use of data mining. The article presents the basic division of data mining methods and their numerous applications in telecommunications. Examples include, among others, the classic problem of filtering emails, belonging to a wider family of detecting adverse events, and market segmentation for marketing purposes.

3

Eksploracja danych w kontekście procesu Knowledge Discovery In Databases (KDD) i metodologii Cross-Industry Standard Process For Data Mining (CRISP-DM)

Mirończuk M., Maciak T.

Metody Informatyki Stosowanej

|

2009

|

nr 2 (19)

65-79

EN

Article aims at introducing for the readers few problems connected with KDD process, Data Mining project modeling with the use of CRlSP-DM The systemized knowledge, aproaches to and generic terms was presented in the article. In the first part article describes approach to Data Exploration as one of the KDD cycle, which is specialized Knowledge Discovery process. Then article takes the subject of CRlSP-DM method. The context of method usage depending on scale and integration of project, which they concern - ivestigate of useing text mining in Inteligent Decission Support System (IDSS) develop by informatic faculty of Fire Service. At the end of the article the summary was made, which contains common features between the two looks on the exploration and extracting knowledge from data bases.

4

Privacy Aware Data Management and Chase

Im S.

Fundamenta Informaticae

|

2007

|

Vol. 78, nr 4

507-524

EN

One of the key applications that uses the knowledge discovered by data mining is called Chase. Chase is a process that replaces null or missing values with the values predicted by the knowledge, and it is mainly used to obtain more complete information systems or to replace unknown attribute values in user queries. The process improves the quality of query answers with increased volume of reliable data, and helps the system understand user queries that would otherwise be difficult. However, a security breach may occur when a set of data in an information system is confidential. The confidential data can be hidden from the public view. However, Chase has the capability to reveal the hidden data by classifying them as null or missing. In this paper, we discuss disclosure of confidential data by Chase and protection algorithms that reduce the risk. In particular, the proposed algorithms aim to protect confidential data with the least amount of additional data hiding.

5

Metodyka realizacji procesu pozyskiwania wiedzy z danych

Rostek K.

Zarządzanie Przedsiębiorstwem

|

2005

|

Vol. 8, nr 2

70-81

EN

The process of discovering knowledge in databases may become a strong tool which would facilitate retaining competitive advantage on the insurance market. Its classification capabilities will enable insurance companies to acquaint themselves with their customers and their preferences, and to gain an in-depth understanding of the current policy portfolio. As a result of the predictive properties of the process an insurance company will be able to respond proactively to its customers' expectations, and thus ensure retaining the customers and counteracting customer loss to competitors. The predictive properties also enable some market behavior and risks on the part of the competition to be anticipated before they actually occur, and this increases the probability of being the first to counteract. However, in order to enable correct and effective completion of the knowledge discovery and data mining processes, the methodology of the process completion has to be adhered to and the terms and conditions for each of the stages of the process have to be met. Many computer information tools have been developed to support knowledge discovery in databases. However, even the best of programs will not solve all the problems connected with completion of the process and are not sufficient to ensure the success of the project as a whole. It is necessary to have an effective and efficient operating methodology. The methodology I am presenting has been developed based on the known methodology of computer tools for data mining purposes (specificity on the SAS and SPSS methodologies) and on the basis of my own professional experience. Methodology DAD (Data-Analysis-Decision) includes nine stages: formulating process assessments, extracting records and variables, extract processing, getting acquainted with the data, analysis of cross-correlations, analysis of multi-factor correlations, assessment of model results, transformation of results into knowledge, assessment of the usefulness of knowledge. The particular description of methodology DAD are presented in this article.

6

Wiedza uświadomiona i nieuświadomiona w kształtowaniu procedury diagnostycznej

Jagielski J., Skorupska I.

Pomiary Automatyka Kontrola

|

2005

|

R. 51, nr 9 bis

151--153

PL

W artykule przedstawiono podstawową procedurę diagnostyki technicznej. Scharakteryzowano wiedzę jaka może być wykorzystana do projektowania diagnostyki. Wprowadzono pojęcie wiedzy uświadomionej i wiedzy nieuświadomionej. Zaproponowano zintegrowane podejście do projektowania diagnostyki. Uwzględniono zastosowanie metod odkrywania wiedzy w bazach danych.

EN

In the article basic procedure of the technical diagnostics is presented. Knowledge which can be used to design of diagnostics is outlined. Notion of the conscious and unconscious knowledge is introduced. Integrated approach to the design of the diagnostics is proposed. Use of knowledge exploration methods in databases was took into account.

7

Projekcja i selekcja atrybutów w identyfikacji modeli dynamicznych metodami odkryć wiedzy w bazach danych

Wachla D.

Pomiary Automatyka Kontrola

|

2005

|

R. 51, nr 9 bis

142--144

PL

W artykule poruszono problem projekcji i selekcji atrybutów w bazach danych dla potrzeb budowy ilościowych modeli obiektów dynamicznych. Za pomocą operacji projekcji dokonywane jest przekształcenie atrybutów w wielowymiarową przestrzeń regresorów. Następnie, w przestrzeni regresorów wybierany jest zbiór atrybutów, które w sensie funkcyjnym najlepiej opisują zmienną zależną. Zmienna, zależną jest tu jeden lub kilka atrybutów, wcześniej wybranych z przestrzeni regresorów. Opisaną metodę zweryfikowano dla problemu określenia zbioru zmiennych niezależnych tworzących model przykładowego, nieliniowego systemu dynamicznego typu MISO. Przedstawiono przyjęte założenia, fragmenty uzyskanych wyników i wnioski z przeprowadzonych badań.

EN

In the article, the problem of the projection and selection of the attributes in the databases for the needs of building quantitative dynamics models has been looked into. By means of the projection operation, the transformation of the attributes into multidimensional space of regressors is carried out. Later on, in the rcgrcssors space such a set of attributes is chosen which, in a functional sense, best describes a dependent variable. The dependent variable constitutes one or few attributes previously picked up from the regressors space. The described method for the problem of defining a set of independent variables forming a model of exemplary nonlinear dynamic MISO-type system was verified. The assumptions, the fragments of procured results and the results of conducted research are presented.

8

Discovering Motifs in DNA Sequences

Guan J.W., Liu D.Y., Bell D.A.

Fundamenta Informaticae

|

2004

|

Vol. 59, nr 2,3

119--134

EN

Large collections of genomic information have been accumulated in recent years, and embedded latently in them is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. The approach taken here to the distillation of such knowledge is to detect strings in DNA sequences which appear frequently, either within a given sequence (eg for a particular patient) or across sequences (eg from different patients sharing a particular medical diagnosis). Motifs are strings that occur very frequently. We present basic theory and algorithms for finding very frequent and common strings. Strings which are maximally frequent are of particular interest and, having discovered such motifs we show briefly how to mine association rules by an existing rough sets based technique. Further work and applications are in progress.

9

Data mining-generation and visualisation of decision trees

Kwaśnicka H., Doczekalski M.

Systems Science

|

2002

|

Vol. 28, nr 3

63-84

EN

A computer system presented in the paper is developed as a data mining tool-it allows using large databases as a source for the process of decision tree generation and visualisation. The designed system (DTB&V-Decision Tree Builder and Visualiser) is able to perform data preprocessing, generation of decision trees followed by their post-processing and visualisation. DTB&V was tested using a number of databases commonly employed for such tasks.

10

Filtracja zbioru reguł decyzyjnych wykorzystująca funkcje oceny jakości reguł

Sikora M.

Studia Informatica

|

2001

|

Vol. 22, nr 4

57-72

PL

W pracy przedstawiono algorytmy pozwalające ograniczyć liczbę reguł decyzyjnych wykorzystywanych w opisie i klasyfikacji. Prezentowane algorytmy wykorzystują funkcje oceniające jakość reguł do wyselekcjonowania reguł najbardziej istotnych z punktu widzenia klasyfikacji i opisu. Omówiono kilka funkcji oceniających jakość reguły. Zaprezentowano wyniki przeprowadzonych eksperymentów. Do wstępnego wyznaczania zbioru reguł decyzyjnych zastosowano tolerancyjny model zbiorów przybliżonych

EN

In this article the algorithms allowing to decrease the number of decision rules using in description and classification were introduced. Presented algorithms use rules quality functions for selection the most important in classification and description rules. Several rules quality functions were discussed. The results of the experiments were presented. To generation decision rules set the tolerance based rough sets model was used.