Similarity function recommender service using incremental user knowledge acquisition

Authors:
Seung Hwan Ryu;Boualem Benatallah;Hye-Young Paik;Yang Sok Kim;Paul Compton
Affiliations:
School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science & Engineering, University of New South Wales, Sydney, NSW, Australia
Venue:
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Year:
2011

Citing 21
Cited 1

A philosophical basis for knowledge acquisition

Knowledge Acquisition
IntelliClean: a knowledge-based intelligent data cleaner

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Approximate String Matching

ACM Computing Surveys (CSUR)
Efficient data reconciliation

Information Sciences: an International Journal
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Data delivery in a service-oriented world: the BEA aquaLogic data services platform

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
A Comparison of Personal Name Matching: Techniques and Practical Issues

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
People search: Searching people sharing similar interests from the Web

Journal of the American Society for Information Science and Technology
Astoria: A Programming Model for Data on the Web

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
An Incremental Knowledge Acquisition Method for Improving Duplicate Invoices Detection

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Experience with long-term knowledge acquisition

Proceedings of the sixth international conference on Knowledge capture
AMC - A framework for modelling and comparing matching systems as matching processes

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Integrating feature analysis and background knowledge to recommend similarity functions

WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similar entity search is the task of identifying entities that most closely resemble a given entity (e.g., a person, a document, or an image). Although many techniques for estimating similarity have been proposed in the past, little work has been done on the question of which of the presented techniques are most suitable for a given similarity analysis task. Knowing the right similarity function is important as the task is highly domain- and data-dependent. In this paper, we propose a recommender service that suggests which similarity functions (e.g., edit distance or jaccard similarity) should be used for measuring the similarity between two entities. We introduce the notion of “similarity function recommendation rule” that captures user knowledge about similarity functions and their usage contexts. We also present an incremental knowledge acquisition technique for building and maintaining a set of similarity function recommendation rules.