An ontology-based method for duplicate detection in web data tables

Authors:
Patrice Buche;Juliette Dibie-Barthélemy;Rania Khefifi;Fatiha Saïs
Affiliations:
INRA - UMR IATE, Montpellier Cedex, France and LIRMM, CNRS-UM2, Montpellier, France;INRA - Mét@risk & AgroParisTech, Paris Cedex, France;LRI (CNRS & Paris-Sud 11 University)/INRIA Saclay Orsay Cedex, France;LRI (CNRS & Paris-Sud 11 University)/INRIA Saclay Orsay Cedex, France
Venue:
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Year:
2011

Citing 13
Cited 0

Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Towards general measures of comparison of objects

Fuzzy Sets and Systems - Special issue dedicated to the memory of Professor Arnold Kaufmann
The three semantics of fuzzy sets

Fuzzy Sets and Systems - Special issue: fuzzy sets: where do we stand? Where do we go?
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Towards a Unified Querying System of Both Structured and Semi-structured Imprecise Data Using Fuzzy View

ICCS '00 Proceedings of the Linguistic on Conceptual Structures: Logical Linguistic, and Computational Issues
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology

ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Combining a Logical and a Numerical Method for Data Reconciliation

Journal on Data Semantics XII
L2R: a logical method for reference reconciliation

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Flexible SPARQL Querying of Web Data Tables Driven by an Ontology

FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Google fusion tables: data management, integration and collaboration in the cloud

Proceedings of the 1st ACM symposium on Cloud computing
Google fusion tables: web-centered data management and collaboration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers

IEEE Transactions on Fuzzy Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present, in this paper, a duplicate detection method in semantically annotated Web data tables, driven by a domain Termino-Ontological Resource (TOR). Our method relies on the fuzzy semantic annotations automatically associated with the Web data tables. A fuzzy semantic annotation is automatically associated with each row of a Web data table. It corresponds to the instantiation of a composed concept of the domain TOR, which represents the semantic n-ary relationship that exists between the columns of the Web data table. A fuzzy semantic annotation contains fuzzy values expressed as fuzzy sets. We propose an automatic duplicate detection method which consists in detecting the pairs of duplicate fuzzy semantic annotations and relies on (i) knowledge declared in the domain TOR and on (ii) similarity measures between fuzzy sets. Two new similarity measures are defined to compare both, the symbolic fuzzy values and the numerical fuzzy values. Our method has been tested on a real application in the domain of chemical risk in food.