A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation

Guo, L.; Wang, W.; Chen, F.; Tang, X.; Wang, W.

Artykuł - szczegóły

Tytuł artykułu

A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation

Autorzy

Guo L. , Wang W. , Chen F. , Tang X. , Wang W.

Wybrane pełne teksty z tego czasopisma

http://pe.org.pl/

Identyfikatory

Warianty tytułu

Metoda detekcji podwójnych danych bazująca na rozmytym klastrowaniu

Języki publikacji

Abstrakty

The changing information technology makes data increase exponentially in all areas, the quality of the huge amounts of data is the core problems. Data cleaning is an effective technology to solve data quality problems. This paper focuses on the duplicate data cleaning techniques. It studies the quality of the data from the architectural level, the instance-level problems, the multi-source single-source problems, duplicated records cleaning application platform and the evaluation criteria. In these studies, a improved novel detection method adopts the fuzzy clustering algorithm with the Levenshtein distance combination to data cleaning .It can accurately and quickly detect and remove duplicate raw data. The improved method includes a similar duplicate records detection process, the major system framework design, system function modules of the implementation process and results analysis in the paper. The precision and recall rates are higher than several other data cleaning methods. These comparisons confirm the validity of the method. The experimental results exhibit that the proposed method is effective in data detection and cleaning process.

Artykuł proponuje nowe metody czyszczenia danych z uwzględnieniem liczby przypadków, wielu źródeł, podwójnych rekordów i innych kryteriów oceny. Ulepszona metoda detekcji wykorzystuje algorytm rozmytego klastrowania w dystansem Levenshteina. W ten sposób szybko wykrywane są i usuwane podwójne wiersze danych.

Słowa kluczowe

approximateduplicate data fuzzy clustering data cleaning Levenshtein distance

czyszczenie danych rozmyte klastrowanie

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Elektrotechniczny

Rocznik

2012

Tom

R. 88, nr 1b

Strony

26--30

Opis fizyczny

Bibliogr. 9 poz., rys., tab., wykr.

Twórcy

autor

Guo L.

autor

Wang W.

autor

Chen F.

autor

Tang X.

autor

Wang W.

Early Warning Surveillance Intelligence, Air Force Radar Academy, Wuhan, 430019, China, radar_boss@163.com

Bibliografia

[1] Stojmenovic, X. Lin. Power-aware localized routing in wireless networks, (2001) 12,No.11,1122-1133
[2] Huseyin Ozgur Tan, Ibrahim Korpeogle. Power Efficient Data Gathering and Aggregation in Wireless Sensor Networks, (2003) 32,No.4, 66-71
[3] Jian Yu,Miin-Shen Yang.A Generalized Fuzzy Clustering Regularization Model With Optimality Tests and Model Complexity Analysis, IEEE Transactions on Fuzzy Systems, (2007) 15,No.5, 904-915
[4] Chatzis, S.,Varvarigou, T. Factor Analysis Latent Subspace Modeling and Robust Fuzzy Clustering Using t-Distributions, IEEE Transactions on Fuzzy Systems , (2009) 17, 505-817
[5] M. Cardei, J. Wu, M. Lu. Improving network lifetime using sensors with adjustable sensing ranges, Sensor Networks, (2006) 10,No.2,41-49
[6] Luo H., Luo J., Liu Y., Das S. K. Adaptive Data Fusion for Energy Efficient Routing in Wireless Sensor Networks, IEEE Trans. on Computers, (2006) 18,No.4, 1286-1299
[7] Yao Shen, Yunze Cai, Xiaoming Xu.A shortest-path-based topology control algorithm in wireless multihop networks, Computer Communication Review, (2007) 37,No.5, 29-38
[8] M. Zuniga , B. Krishnamachari. Analyzing the transitional region in low power wireless links, IEEE Secon’04, 2004.
[9] K. Seada,M. Zuniga,A. Helmy,B.Krishnamachari,Energy efficient fowwarding strategies for geographic routing in wireless sensor networks,in ACM Sensys’04, Baltimore, MD, Nov. 2004.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPOB-0049-0006