Mining for patterns in contradictory data

  • Authors:
  • Ulf Leser;Johann-Christoph Freytag

  • Affiliations:
  • Humboldt-Universität zu Berlin, Berlin, Germany;Humboldt-Universität zu Berlin, Berlin, Germany

  • Venue:
  • Proceedings of the 2004 international workshop on Information quality in information systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information integration is often faced with the problem that different data sources represent the same set of the real-world objects, but give conflicting values for specific properties of these objects. Within this paper we present a model of such conflicts and describe an algorithm for efficiently detecting patterns of conflicts in a pair of overlapping data sources. The contradiction patterns we can find are a special kind of association rules, describing regularities in conflicts occurring together with certain attribute values, paris of attribute values, or with other conflicts. Therefore, we adapt existing association rule mining algorithms for mining contradiction patterns. Such patterns are an important tool for human experts that try to find and resolve problems in data quality using domain knowledge. We present the results of applying our method on a real world data set from the life science domain and show how it helps to generate clean data for integrated data warehouses.