Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

  • Authors:
  • Sumeet Dua;Michael P. Dessauer;Prerna Sethi

  • Affiliations:
  • Data Mining Research Laboratory, Department of Computer Science, Louisiana Tech University, Ruston, USA 71272;Data Mining Research Laboratory, Department of Computer Science, Louisiana Tech University, Ruston, USA 71272;Department of Health Informatics and Information Management, Louisiana Tech University, Ruston, USA 71272

  • Venue:
  • Journal of Medical Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Medical sciences are rapidly emerging as a data rich discipline where the amount of databases and their dimensionality increases exponentially with time. Data integration algorithms often rely upon discovering embedded, useful, and novel relationships between feature attributes that describe the data. Such algorithms require data integration prior to knowledge discovery, which can lack the timeliness, scalability, robustness, and reliability of discovered knowledge. Knowledge integration algorithms offer pattern discovery on segmented and distributed databases but require sophisticated methods for pattern merging and evaluating integration quality. We propose a unique computational framework for discovering and integrating frequent sets of features from distributed databases and then exploiting them for unsupervised learning from the integrated space. Assorted indices of cluster quality are used to assess the accuracy of knowledge merging. The approach preserves significant cluster quality under various cluster distributions and noise conditions. Exhaustive experimentation is performed to further evaluate the scalability and robustness of the proposed methodology.