Toward Multidatabase Mining: Identifying Relevant Databases

  • Authors:
  • Huan Liu;Hongjun Lu;Jun Yao

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Various tools and systems for knowledge discovery and data mining are developed and available for applications. However, when we are immersed in heaps of databases, an immediate question is where we should start mining. It is not true that the more databases, the better for data mining. It is only true when the databases involved are relevant to a task at hand. In this paper, breaking away from the conventional data mining assumption that many databases be joined into one, we argue that the first step for multidatabase mining is to identify databases that are most likely relevant to an application; without doing so, the mining process can be lengthy, aimless, and ineffective. A measure of relevance is thus proposed for mining tasks with an objective of finding patterns or regularities about certain attributes. An efficient algorithm for identifying relevant databases is described. Experiments are conducted to verify the measure's performance and to exemplify its application.