Wyniki wyszukiwania - Biblioteka Nauki

1

Data cleaning of medical data sets

100%

Widera A. , Widera M. , Feige D. , Horoba K. , Stankiewicz A.

Journal of Medical Informatics & Technologies

|

2004

|

tom Vol. 8

MM129--140

EN

Each database system evolves during the time. If the primary database schema was designed only to store the limited scope of abstraction classes then the database system improvement process is performed in traditional way (using alter table, update table and create table commands). Anyhow it could be impossible, from the engineering point of view, or too expensive from the economic point of view. Transferring the data from one database schema to another database schema one has to perform an additional step called Data Cleaning. This paper present a basic sketch for the data cleaning theory based on the materialised views idea and corresponding data cleaning environment. The proposed methodology is suitable not only for the data verification but also for the reengineering of the broken references between data fields, recreation of missing rows and data types conversion.

2

100%

Guo L. , Wang W. , Chen F. , Tang X. , Wang W.

Przegląd Elektrotechniczny

|

2012

|

tom R. 88, nr 1b

26-30

EN

The changing information technology makes data increase exponentially in all areas, the quality of the huge amounts of data is the core problems. Data cleaning is an effective technology to solve data quality problems. This paper focuses on the duplicate data cleaning techniques. It studies the quality of the data from the architectural level, the instance-level problems, the multi-source single-source problems, duplicated records cleaning application platform and the evaluation criteria. In these studies, a improved novel detection method adopts the fuzzy clustering algorithm with the Levenshtein distance combination to data cleaning .It can accurately and quickly detect and remove duplicate raw data. The improved method includes a similar duplicate records detection process, the major system framework design, system function modules of the implementation process and results analysis in the paper. The precision and recall rates are higher than several other data cleaning methods. These comparisons confirm the validity of the method. The experimental results exhibit that the proposed method is effective in data detection and cleaning process.

PL

Artykuł proponuje nowe metody czyszczenia danych z uwzględnieniem liczby przypadków, wielu źródeł, podwójnych rekordów i innych kryteriów oceny. Ulepszona metoda detekcji wykorzystuje algorytm rozmytego klastrowania w dystansem Levenshteina. W ten sposób szybko wykrywane są i usuwane podwójne wiersze danych.

3

SDAE cleaning model of wind speed monitoring data in the Mine Monitoring System

84%

Zhao D. , Shen Z. , Song Z. , Xie L.

Archives of Mining Sciences

|

2023

|

tom Vol. 68, no. 2

251--266

EN

The effective utilisation of monitoring data of the coal mine is the core of realising intelligent mine. The complex and challenging underground environment, coupled with unstable sensors, can result in “dirty” data in monitoring information. A reliable data cleaning method is necessary to figure out how to extract high-quality information from large monitoring data sets while minimising data redundancy. Based on this, a cleaning method for sensor monitoring data based on stacked denoising autoencoders (SDAE) is proposed. The sample data of the ventilation system under normal conditions are trained by the SDAE algorithm and the upper limit of reconstruction errors is obtained by Kernel density estimation (KDE). The Apriori algorithm is used to study the correlation between monitoring data time series. By comparing reconstruction errors and error duration of test data with the upper limit of reconstruction error and tolerance time, cooperating with the correlation rule, the “dirty” data is resolved. The method is tested in the Dongshan coal mine. The experimental results show that the proposed method can not only identify the dirty data but retain the faulty information. The research provides effective basic data for fault diagnosis and disaster warning.

4

Wpływ jakości danych mapowych na użyteczność teleinformatycznych systemów informacji logistycznej

84%

Kalbarczyk-Guzek E.

Systemy Logistyczne Wojsk

|

2019

|

tom Vol. 51, no. 2

57--70

PL

W artykule omówiono zagrożenia jakie niesie za sobą nieprawidłowa jakość danych mapowych w organizacjach zajmujących się logistyką w kontekście procesów jakie obsługują. Zwrócono uwagę na problematykę błędnych danych adresowych, zarówno na poziomie strategicznym jak i operacyjnym. Wyszczególniono problemy, z jakimi borykają się firmy logistyczne posiadające bazy z błędnie zgeokodowanymi kontrahentami. Opisano dostępne na rynku metody czyszczenia danych oferowane przez wyspecjalizowane przedsiębiorstwa rynkowe.

EN

The article discusses the risks of incorrect map data quality in logistics organizations in the context of the processes they support. Attention was paid to the problem of erroneous address data, both at the strategic and operational levels. Listed are the problems faced by logistics companies with databases with incorrectly geocoded contractors. Data cleaning methods available on the market, offered by specialized market companies, have been described.

5

Eye tracking data cleansing for dialogue agent

67%

Gabor-Siatkowska K. , Stefaniak I. , Janicki A.

Biuletyn Naukowy Wrocławskiej Wyższej Szkoły Informatyki Stosowanej. Informatyka

|

2023

|

tom Vol. 10

1--14

EN

Eye trackers are commonly used in many research fields, e.g., education, marketing, psychology, medicine, and human-computer interface. Although eye tracker companies provide software with built-in preprocessing algorithms for handling undesired data issues, e.g. blinks, in eye tracking data, the gathered data often has to be additionally processed to become useful for further analyses. In this article, we present an algorithm for eye-tracking data preprocessing, especially when talking about cleansing pupil diameter data. Due to the insufficient detection algorithm provided by the eye-tracking software, our algorithm considers the maximum velocity of human pupil contraction. Our experiments have been conducted on a Gazepoint GP3 device with a sampling frequency of 60 Hz, which is widely used in various research fields. This proposed approach enables researchers to better preprocess their collected pupil data, referring to the behaviour of the human pupil diameter. It makes pupil data preprocessing quick and applicable for any further analyses in various research fields.

PL

Urządzenia śledzące ruch gałek ocznych (tzw. okulografy) są powszechnie stosowane w wielu dziedzinach, np. w medycynie, edukacji, marketingu, psychologii oraz w interfejsach człowiek–komputer. Producenci okulografów proponują nie tylko sprzęt, ale również odpowiednie oprogramowanie, które umożliwia wstępną analizę takich parametrów jak punkty fiksacji wzroku czy wielkość średnicy źrenicy użytkownika. Oprogramowania te zazwyczaj posiadają już wbudowany algorytm do wstępnego oznaczenia niepożądanych danych (np. mrugnięć). Mrugnięcia te zazwyczaj nie są pożądanym zjawiskiem i muszą być dodatkowo przetworzone na wczesnym etapie przed przystąpieniem do dalszych analiz. Niestety nie zawsze wbudowany algorytm detekcji mrugnięć jest wystarczający na potrzeby badań. Niniejszy artykuł opisuje algorytm wstępnego przetwarzania danych okulograficznych, a konkretnie danych źrenicznych; uwzględnia on maksymalną prędkość skurczu ludzkich źrenic. Nasze eksperymenty zostały przeprowadzone na urządzeniu Gazepoint GP3 o częstotliwości próbkowania 60 Hz, które jest powszechnie dostępne. Zaproponowane przez nas rozwiązanie umożliwia szybsze i dokładniejsze przetwarzanie tych danych, uwzględniając przy tym własności ludzkiej źrenicy, i może być szeroko wykorzystywane przy eksperymentach w różnych badaniach okulograficznych.

6

Decision support and maintenance system for natural hazards, processes and equipment monitoring

67%

Kozielski M. , Sikora M. , Wróbel Ł.

Eksploatacja i Niezawodność

|

2016

|

tom Vol. 18, no. 2

218--228

EN

This paper presents the DISESOR integrated decision support system and its applications. The system integrates data from different monitoring and dispatching systems and contains such modules as data preparation and cleaning, analytical, prediction and expert system. Architecture of the system is presented in the paper and a special focus is put on the presentation of two issues: data integration and cleaning, and creation of prediction model. The work contains also two case studies presenting the examples of the system application.

PL

W pracy przedstawiono zintegrowany system wspomagania decyzji DISESOR oraz jego zastosowania. System pozwala na integrację danych pochodzących z różnych systemów monitorowania i systemów dyspozytorskich. Struktura systemu DISESOR składa się z modułów realizujących: przygotowanie i czyszczenie danych, analizę danych, zadania predykcyjne oraz zadania systemu ekspertowego. W pracy przedstawiono architekturę systemu DISESOR, a szczególny nacisk został położony na zagadnienia związane z integracją i czyszczeniem danych oraz tworzeniem modeli predykcyjnych. Działanie systemu przedstawione zostało na dwóch przykładach analizy dla danych rzeczywistych.