Discovering and using semantics for database schemas

  • Authors:
  • Yuan An

  • Affiliations:
  • University of Toronto (Canada)

  • Venue:
  • Discovering and using semantics for database schemas
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This dissertation studies the problem of discovering and using semantics for structured and semi-structured data, such as relational databases and XML documents. Semantics is captured in terms of mappings from a database schema to conceptual schemas/ontologies. Data semantics lies at the heart of data integration—the problem of sharing data across disparate sources. To address this problem, database researchers have proposed a host of solutions including federated databases, data warehousing, mediator-wrapper-based data integration systems, peer-to-peer data management systems, and more recently data spaces. In the Semantic Web community, the solution to the problem of providing machine understandable data for better web-wide information retrieval and exchange is to annotate web data using formal domain ontologies. A central issue in all of these solutions is the problem of capturing the semantics of the data to be integrated. This dissertation describes our solutions for discovering semantics for data and using the semantics to facilitate the discovery of schema mappings. First, we develop a semi-automatic tool, MAPONTO, for discovering semantics for a database schema in terms of a given conceptual model (hereafter CM). The tool takes as inputs a relational or XML database schema, a CM covering the same domain as the database, and a set of simple element correspondences from schema elements to datatype properties in the CM. It then generates a set of logical formulas that define a mapping from the schema to the CM. The key is to align the integrity constraints in the schema with the semantic constructs in the CM, guided by standard database design principles. Second, we extend MAPONTO with a semantic approach to finding schema mapping expressions. The approach leverages the semantics of schemas expressed in terms of CMs. We present experimental results demonstrating that MAPONTO saves significant human effort in discovering the semantics of database schemas and it outperforms the traditional mapping techniques for building complex schema mapping expressions in terms of both recall and precision. The development of MAPONTO provides a suite of practical tools for recovering semantics for database-resident data and generating improved schema mapping results for data integration.