Peer architectures for knowledge sharing

  • Authors:
  • Peter Mork;Alon Halevy;Peter Tarczy-Hornoch

  • Affiliations:
  • University of Washington;University of Washington;University of Washington

  • Venue:
  • Peer architectures for knowledge sharing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale sharing of biologic information requires developing new approaches for accessing large numbers of data sources. For example, consider an experiment that measures genome-wide expression levels. Analyzing these results requires annotating them with information culled from public repositories, which are stored in separate databases, maintained by various organizations, using dissimilar nomenclature. Given the cost of manually visiting each data source, an automated solution is required. One solution to this problem is data integration, in which the user is provided a logical view of the underlying sources. These systems provide syntactic and semantic mediation, but generating the mediated view requires consensus concerning the intended semantics. In biomedical research it is neither feasible nor desirable to establish wide-scale consensus. Thus, a new framework for information is required in which each participant can make its own semantic decisions. This dissertation demonstrates that data integration technology can be extended to meet the challenges of integrating biologic information. These extensions support knowledge sharing in a peer-based environment. Each participant gains the benefits of data integration without the burden of establishing consensus a priori. We begin by establishing a framework for peer-based integration in which software components can easily be extended. We validate this framework by implementing multiple systems within the framework. We then describe a novel query language that allows a query to constrain results based on data and metadata. This language blurs the traditional distinction between instances and schemata and serves as the public interface to a number of online genetic databases. In a peer-based system, schemata are coordinated using semantic mappings. Given a query posed against one schema, we provide a reformulation algorithm that generates equivalent rewritings based on mappings expressed as logical definitions. We experimentally validate that these mappings are more expressive than traditional declarative mappings using queries. Finally, we describe a set of rules for propagating changes to the underlying data, which are used to identify efficient strategies for maintaining data replicas. These contributions demonstrate that data integration technology can be extended to support peer-based knowledge integration, with which we are better able to meet the particular needs of biologic researchers.