Approximate data instance matching: a survey

Authors:
Carina Friedrich Dorneles;Rodrigo Gonçalves;Ronaldo dos Santos Mello
Affiliations:
Universidade Federal de Santa Catarina (UFSC), Centro Tecnologico (CTC), Depto. Informatica e Estatistica (INE), Campus Universitario Trindade, Florianopolis, SC, Brazil;Universidade Federal de Santa Catarina (UFSC), Centro Tecnologico (CTC), Depto. Informatica e Estatistica (INE), Campus Universitario Trindade, Florianopolis, SC, Brazil;Universidade Federal de Santa Catarina (UFSC), Centro Tecnologico (CTC), Depto. Informatica e Estatistica (INE), Campus Universitario Trindade, Florianopolis, SC, Brazil
Venue:
Knowledge and Information Systems
Year:
2011

Citing 0
Cited 6

Matching product titles using web-based enrichment

Proceedings of the 21st ACM international conference on Information and knowledge management
Tuning large scale deduplication with reduced effort

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Exploiting user clicks for automatic seed set generation for entity matching

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph

Applied Intelligence
Proposta de um framework para visualização de dados agregados por similaridade para auxiliar consultas durante a navegação na web

Proceedings of the 12th Brazilian Symposium on Human Factors in Computing Systems
A methodological approach to mining and simulating data in complex information systems

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate data matching is a central problem in several data management processes, such as data integration, data cleaning, approximate queries, similarity search and so on. An approximate matching process aims at defining whether two data represent the same real-world object. For atomic values (strings, dates, etc), similarity functions have been defined for several value domains (person names, addresses, and so on). For matching aggregated values, such as relational tuples and XML trees, approaches alternate from the definition of simple functions that combine values of similarity of record attributes to sophisticated techniques based on machine learning, for example. For complex data comparison, including structured and semistructured documents, existing approaches use both structure and data for the comparison, by either considering or not considering data semantics. This survey presents terminology and concepts that base approximated data matching, as well as discusses related work on the use of similarity functions in such a subject.