Ontology-Based Data Cleaning

Authors:
Zoubida Kedad;Elisabeth Métais
Affiliations:
-;-
Venue:
NLDB '02 Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers
Year:
2002

Citing 20
Cited 9

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
Semantic vs. structural resemblance of classes

ACM SIGMOD Record
Data manipulation in heterogeneous databases

ACM SIGMOD Record
CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
CYC, WordNet, and EDR: critiques and responses

Communications of the ACM
The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Semantic similarity relations and computation in schema integration

Data & Knowledge Engineering
The Carnot Heterogeneous Database Project: Implemented Applications

Distributed and Parallel Databases
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Arktos: towards the modeling, design, control and execution of ETL processes

Information Systems - Data extraction, cleaning and reconciliation
A knowledge-based approach for duplicate elimination in data cleaning

Information Systems - Data extraction, cleaning and reconciliation
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Discovering and reconciling value conflicts for numerical data integration

Information Systems - Data extraction, cleaning and reconciliation
Formal Ontology in Information Systems: Proceedings of the 1st International Conference June 6-8, 1998, Trento, Italy

Formal Ontology in Information Systems: Proceedings of the 1st International Conference June 6-8, 1998, Trento, Italy
Understanding semantic relationships

The VLDB Journal — The International Journal on Very Large Data Bases
Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Using Conceptual Graph Theory to Support Schema Integration

ER '93 Proceedings of the 12th International Conference on the Entity-Relationship Approach: Entity-Relationship Approach
Database Schema Design: A Perspective From Natural Language Techniques to Validation and View Integration

ER '93 Proceedings of the 12th International Conference on the Entity-Relationship Approach: Entity-Relationship Approach
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

An Ontology-Based Method to Link Database Integration and Data Mining within a Biomedical Distributed KDD

AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Representation of conceptual ETL designs in natural language using Semantic Web technology

Data & Knowledge Engineering
Improving the development of data warehouses by enriching dimension hierarchies with WordNet

ODBIS'05/06 Proceedings of the First and Second VLDB conference on Ontologies-based databases and information systems
Context-aware replacement operations for data cleaning

Proceedings of the 2011 ACM Symposium on Applied Computing
Deeper semantics goes a long way: fuzzified representation and matching of color descriptions for online clothing search

WISE'06 Proceedings of the 7th international conference on Web Information Systems
OntoDataClean: ontology-based integration and preprocessing of distributed data

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Ontology-Driven conceptual design of ETL processes using graph transformations

Journal on Data Semantics XIII
UWN: a large multilingual lexical knowledge base

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Flexible and customizable NL representation of requirements for ETL processes

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-source information systems, such as data warehouses, are composed of a set of heterogeneous and distributed data sources. The relevant information is extracted from these sources, cleaned, transformed and then integrated. The confrontation of two different data sources may reveal different kinds of heterogeneities: at the intensional level, the conflicts are related to the structure of the data. At the extensional level, the conflicts are related to the instances of the data. The process of detecting and solving the conflicts at the extensional level is known as data cleaning. In this paper, we will focus on the problem of differences in terminologies and we propose a solution based on linguistic knowledge provided by a domain ontology. This approach is well suited for application domains with intensive classification of data such as medicine or pharmacology. The main idea is to automatically generate some correspondence assertions between instances of objects. The user can parametrize this generation process by defining a level of accuracy expressed using the domain ontology.