Tytuł artykułu
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Metoda detekcji podwójnych danych bazująca na rozmytym klastrowaniu
Języki publikacji
Abstrakty
The changing information technology makes data increase exponentially in all areas, the quality of the huge amounts of data is the core problems. Data cleaning is an effective technology to solve data quality problems. This paper focuses on the duplicate data cleaning techniques. It studies the quality of the data from the architectural level, the instance-level problems, the multi-source single-source problems, duplicated records cleaning application platform and the evaluation criteria. In these studies, a improved novel detection method adopts the fuzzy clustering algorithm with the Levenshtein distance combination to data cleaning .It can accurately and quickly detect and remove duplicate raw data. The improved method includes a similar duplicate records detection process, the major system framework design, system function modules of the implementation process and results analysis in the paper. The precision and recall rates are higher than several other data cleaning methods. These comparisons confirm the validity of the method. The experimental results exhibit that the proposed method is effective in data detection and cleaning process.
Artykuł proponuje nowe metody czyszczenia danych z uwzględnieniem liczby przypadków, wielu źródeł, podwójnych rekordów i innych kryteriów oceny. Ulepszona metoda detekcji wykorzystuje algorytm rozmytego klastrowania w dystansem Levenshteina. W ten sposób szybko wykrywane są i usuwane podwójne wiersze danych.
Wydawca
Czasopismo
Rocznik
Tom
Strony
26--30
Opis fizyczny
Bibliogr. 9 poz., rys., tab., wykr.
Bibliografia
- [1] Stojmenovic, X. Lin. Power-aware localized routing in wireless networks, (2001) 12,No.11,1122-1133
- [2] Huseyin Ozgur Tan, Ibrahim Korpeogle. Power Efficient Data Gathering and Aggregation in Wireless Sensor Networks, (2003) 32,No.4, 66-71
- [3] Jian Yu,Miin-Shen Yang.A Generalized Fuzzy Clustering Regularization Model With Optimality Tests and Model Complexity Analysis, IEEE Transactions on Fuzzy Systems, (2007) 15,No.5, 904-915
- [4] Chatzis, S.,Varvarigou, T. Factor Analysis Latent Subspace Modeling and Robust Fuzzy Clustering Using t-Distributions, IEEE Transactions on Fuzzy Systems , (2009) 17, 505-817
- [5] M. Cardei, J. Wu, M. Lu. Improving network lifetime using sensors with adjustable sensing ranges, Sensor Networks, (2006) 10,No.2,41-49
- [6] Luo H., Luo J., Liu Y., Das S. K. Adaptive Data Fusion for Energy Efficient Routing in Wireless Sensor Networks, IEEE Trans. on Computers, (2006) 18,No.4, 1286-1299
- [7] Yao Shen, Yunze Cai, Xiaoming Xu.A shortest-path-based topology control algorithm in wireless multihop networks, Computer Communication Review, (2007) 37,No.5, 29-38
- [8] M. Zuniga , B. Krishnamachari. Analyzing the transitional region in low power wireless links, IEEE Secon’04, 2004.
- [9] K. Seada,M. Zuniga,A. Helmy,B.Krishnamachari,Energy efficient fowwarding strategies for geographic routing in wireless sensor networks,in ACM Sensys’04, Baltimore, MD, Nov. 2004.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPOB-0049-0006