Evaluating Cluster Preservation in Frequent Itemset Integration for Distributed Databases

Authors:
Sumeet Dua;Michael P. Dessauer;Prerna Sethi
Affiliations:
Data Mining Research Laboratory, Department of Computer Science, Louisiana Tech University, Ruston, USA 71272;Data Mining Research Laboratory, Department of Computer Science, Louisiana Tech University, Ruston, USA 71272;Department of Health Informatics and Information Management, Louisiana Tech University, Ruston, USA 71272
Venue:
Journal of Medical Systems
Year:
2011

Citing 7
Cited 0

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering Association Rules

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Novel Computational Framework for Fast Distributed Computing and Knowledge Integration for Microarray Gene Expression Data Analysis

AINA '06 Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 02
Analysis of healthcare coverage: A data mining approach

Expert Systems with Applications: An International Journal
Associative classification of mammograms using weighted rules

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Medical sciences are rapidly emerging as a data rich discipline where the amount of databases and their dimensionality increases exponentially with time. Data integration algorithms often rely upon discovering embedded, useful, and novel relationships between feature attributes that describe the data. Such algorithms require data integration prior to knowledge discovery, which can lack the timeliness, scalability, robustness, and reliability of discovered knowledge. Knowledge integration algorithms offer pattern discovery on segmented and distributed databases but require sophisticated methods for pattern merging and evaluating integration quality. We propose a unique computational framework for discovering and integrating frequent sets of features from distributed databases and then exploiting them for unsupervised learning from the integrated space. Assorted indices of cluster quality are used to assess the accuracy of knowledge merging. The approach preserves significant cluster quality under various cluster distributions and noise conditions. Exhaustive experimentation is performed to further evaluate the scalability and robustness of the proposed methodology.