Automated discovery of multi-faceted ontologies for accurate query answering and future semantic reasoning

Authors:
Mohammed Gollapalli;Xue Li;Ian Wood
Affiliations:
School of Information Technology & Electrical Engineering, University of Queensland, Brisbane, Australia;School of Information Technology & Electrical Engineering, University of Queensland, Brisbane, Australia;School of Mathematics & Physics, University of Queensland, Brisbane, Australia
Venue:
Data & Knowledge Engineering
Year:
2013

Citing 18
Cited 0

CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Iterative record linkage for cleaning and integration

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
From databases to dataspaces: a new abstraction for information management

ACM SIGMOD Record
Nested mappings: schema mapping reloaded

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Ontology Matching

Ontology Matching
Automated ontology construction for unstructured text documents

Data & Knowledge Engineering
Parallel linkage

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Schema merging and mapping creation for relational sources

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Schema mapping verification: the spicy way

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Discovering topical structures of databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Automatic record linkage using seeded nearest neighbour and support vector machine classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Top-k generation of integrated schemas based on directed and weighted correspondences

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Reusing ontologies on the Semantic Web: A feasibility study

Data & Knowledge Engineering
Sampling dirty data for matching attributes

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Approximate Record Matching Using Hash Grams

ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
An ontology-based approach for constructing Bayesian networks

Data & Knowledge Engineering
Ontology guided data linkage framework for discovering meaningful data facts

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a surge of interest in the development of probabilistic techniques to discover meaningful data facts across multiple datasets provided by different organizations. The key aim is to approximate the structure and content of the induced data into a concise synopsis in order to extract meaningful data facts. Performing sensible queries across unrelated datasets is a complex task that requires a complete understanding of each contributing database's schema to define the structure of its information. Alternative approaches that use data modeling enterprise tools have been proposed, in order to give users without complex schema knowledge the ability to query databases. Unfortunately, data modeling-based matching is a content-based technique and incurs significant query evaluation costs, due to attribute level pairwise comparisons. We propose a multi-faceted classification technique for performing structural analysis on knowledge domain clusters, using a novel Ontology Guided Data Linkage (OGDL) framework. This framework supports self-organization of contributing databases through the discovery of structural dependencies, by performing multi-level exploitation of ontological domain knowledge relating to tables, attributes and tuples. The framework thus automates the discovery of schema structures across unrelated databases, based on the use of direct and weighted correlations between different ontological concepts, using a h-gram (hash gram) record matching technique for concept clustering and cluster mapping. We demonstrate the feasibility of our OGDL algorithms through a set of accuracy, performance and scalability experimental tests run on real-world datasets, and show that our system runs in polynomial time and performs well in practice. To the best of our knowledge, this is the first attempt initiated to solve data linkage problems using a multi-faceted cluster mapping strategy, and we believe that our approach presents a significant advancement towards accurate query answering and future real-time online semantic reasoning capacity.