TupleRank: ranking discovered content in virtual databases

  • Authors:
  • Jacob Berlin;Amihai Motro

  • Affiliations:
  • Information and Software Engineering Department, George Mason University, Fairfax, VA;Information and Software Engineering Department, George Mason University, Fairfax, VA

  • Venue:
  • NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, the problem of data integration has been newly addressed by methods based on machine learning and discovery. Such methods are intended to automate, at least in part, the laborious process of information integration, by which existing data sources are incorporated in a virtual database. Essentially, these methods scan new data sources, attempting to discover possible mappings to the virtual database. Like all discovery processes, this process is intrinsically probabilistic; that is, each discovery is associated with a specific value that denotes assurance of its appropriateness. Consequently, the rows in a discovered virtual table have mixed assurance levels, with some rows being more credible than others. We argue that rows in discovered virtual databases should be ranked, and we describe a ranking method, called TupleRank, for calculating such a ranking order. Roughly speaking, TupleRank calibrates the probabilities calculated during a discovery process with historical information about the performance of the system. The work is done in the framework of the Autoplex system for discovering content for virtual databases, and initial experimentation is reported and discussed.