Wyniki wyszukiwania - BazTech

Ograniczanie wyników

1 Fundamenta Informaticae

1 2020

Znaleziono wyników: 1

Liczba wyników na stronie

Wyniki wyszukiwania

Sortuj według:

Ogranicz wyniki do:

Time and Space Efficient Large Scale Link Discovery using String Similarities

Karampelas Andreas, Vouros George A.

Fundamenta Informaticae

2020

Vol. 172, nr 3

299--325

This paper proposes and evaluates time and space efficient methods for matching entities in large data sets based on effectively pruning the candidate pairs to be matched, using edit distance as a string similarity metric. The paper proposes and compares three filtering methods that build on a basic blocking technique to organize the target data set, facilitating efficient pruning of dissimilar pairs. The proposed filtering methods are compared in terms of runtime and memory usage: the first method clusters entities and exploits the triangle inequality using the string similarity metric, in conjunction to the substring matching filtering rule. The second method uses only the substring matching rule, while the third method uses the substring matching rule in conjunction to the character frequency matching filtering rule. Evaluation results show the pruning power of the different filtering methods used, also in comparison to the string matching functionality provided in LIMES and SILK, which are state of the art frameworks for large scale link discovery.