The purpose of entity resolution (ER) is to identify records that refer to the same real-world entity from diferent sources. Most traditional ER studies identify records based on string-based data, so the ER problem relies mostly on string comparison techniques. There is little research on numeric-based data. Traditional ER approaches are widely used in many domains, such as papers, gene sequencing and restaurants, but they have not been used in an earthquake disaster. In this paper, earthquake disaster event information that was collected from diferent websites is denoted with numeric data. To solve the problem of ER in numeric data, we use the following methods to conduct experiments. First, we treat numbers as strings and use string-based approaches. Second, we use the Euclidean distance to measure the diference between two records. Third, we combine the above two strategies and use a comprehensive approach to measure the distance between the two records. We experimentally evaluate our methods on real datasets that represent earthquake disaster event information. The experimental results show that a comprehensive approach can achieve high performance.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.