Utility-based resolution of data inconsistencies

  • Authors:
  • Amihai Motro;Philipp Anokhin;Aybar C. Acar

  • Affiliations:
  • George Mason University;George Mason University;George Mason University

  • Venue:
  • Proceedings of the 2004 international workshop on Information quality in information systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A virtual database system is software that provides unified access to multiple information sources. If the sources are overlapping in their contents and independently maintained, then the likelihood of inconsistent answers is high. Solutions are often based on ranking (which sorts the different answers according to recurrence) and on fusion (which synthesizes a new value from the different alternatives according to a specific formula). In this paper we argue that both methods are flawed, and we offer alternative solutions that are based on knowledge about the performance of the source data; including features such as recentness, availability, accuracy and cost. These features are combined in a flexible utility function that expresses the overall value of a data item to the user. Utility allows us to (1) define meaningful ranking on the inconsistent set of answers, and offer the topranked answer as a preferred answer; (2) determine whether a fusion value is indeed better than the initial values, by calculating its utility and comparing it to the utilities of the initial values; and (3) discover the best fusion: the fusion formula that optimizes the utility. The advantages of such performance-based and utility-driven ranking and fusion are considerable.