A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
Semantic vs. structural resemblance of classes
ACM SIGMOD Record
Data manipulation in heterogeneous databases
ACM SIGMOD Record
CYC: a large-scale investment in knowledge infrastructure
Communications of the ACM
CYC, WordNet, and EDR: critiques and responses
Communications of the ACM
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Semantic similarity relations and computation in schema integration
Data & Knowledge Engineering
The Carnot Heterogeneous Database Project: Implemented Applications
Distributed and Parallel Databases
EuroWordNet: a multilingual database with lexical semantic networks
EuroWordNet: a multilingual database with lexical semantic networks
Arktos: towards the modeling, design, control and execution of ETL processes
Information Systems - Data extraction, cleaning and reconciliation
A knowledge-based approach for duplicate elimination in data cleaning
Information Systems - Data extraction, cleaning and reconciliation
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Discovering and reconciling value conflicts for numerical data integration
Information Systems - Data extraction, cleaning and reconciliation
Formal Ontology in Information Systems: Proceedings of the 1st International Conference June 6-8, 1998, Trento, Italy
Understanding semantic relationships
The VLDB Journal — The International Journal on Very Large Data Bases
Flexible Relation: An Approach for Integrating Data from Multiple, Possibly Inconsistent Databases
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Using Conceptual Graph Theory to Support Schema Integration
ER '93 Proceedings of the 12th International Conference on the Entity-Relationship Approach: Entity-Relationship Approach
ER '93 Proceedings of the 12th International Conference on the Entity-Relationship Approach: Entity-Relationship Approach
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
AIME '09 Proceedings of the 12th Conference on Artificial Intelligence in Medicine: Artificial Intelligence in Medicine
Representation of conceptual ETL designs in natural language using Semantic Web technology
Data & Knowledge Engineering
Improving the development of data warehouses by enriching dimension hierarchies with WordNet
ODBIS'05/06 Proceedings of the First and Second VLDB conference on Ontologies-based databases and information systems
Context-aware replacement operations for data cleaning
Proceedings of the 2011 ACM Symposium on Applied Computing
WISE'06 Proceedings of the 7th international conference on Web Information Systems
OntoDataClean: ontology-based integration and preprocessing of distributed data
ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Ontology-Driven conceptual design of ETL processes using graph transformations
Journal on Data Semantics XIII
UWN: a large multilingual lexical knowledge base
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Flexible and customizable NL representation of requirements for ETL processes
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Hi-index | 0.00 |
Multi-source information systems, such as data warehouses, are composed of a set of heterogeneous and distributed data sources. The relevant information is extracted from these sources, cleaned, transformed and then integrated. The confrontation of two different data sources may reveal different kinds of heterogeneities: at the intensional level, the conflicts are related to the structure of the data. At the extensional level, the conflicts are related to the instances of the data. The process of detecting and solving the conflicts at the extensional level is known as data cleaning. In this paper, we will focus on the problem of differences in terminologies and we propose a solution based on linguistic knowledge provided by a domain ontology. This approach is well suited for application domains with intensive classification of data such as medicine or pharmacology. The main idea is to automatically generate some correspondence assertions between instances of objects. The user can parametrize this generation process by defining a level of accuracy expressed using the domain ontology.