Pattern discovery for large mixed-mode database

Authors:
Andrew K.C. Wong;Bin Wu;Gene P.K. Wu;Keith C.C. Chan
Affiliations:
University of Waterloo, Waterloo, ON, Canada;University of Waterloo, Waterloo, ON, Canada;The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 16
Cited 0

Synthesizing Statistical Knowledge from Incomplete Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
Tissue classification with gene expression profiles

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
A discrete-valued clustering algorithm with applications to biomolecular data

Information Sciences: an International Journal
Discretization: An Enabling Technique

Data Mining and Knowledge Discovery
High-Order Pattern Discovery from Discrete-Valued Data

IEEE Transactions on Knowledge and Data Engineering
Pattern Discovery by Residual Analysis and Recursive Partitioning

IEEE Transactions on Knowledge and Data Engineering
Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Interval Classifier for Database Mining Applications

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
From Association to Classification: Inference Using Weight of Evidence

IEEE Transactions on Knowledge and Data Engineering
Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A global optimal algorithm for class-dependent discretization of continuous data

Intelligent Data Analysis
Typicality, Diversity, and Feature Pattern of an Ensemble

IEEE Transactions on Computers
Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis

IEEE Transactions on Knowledge and Data Engineering
Pattern discovery: a data driven approach to decision support

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A novel evolutionary data mining algorithm with applications to churn prediction

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In business and industry today, large databases with mixed data types (continuous and categorical) are very common. There are great needs to discover patterns from them for knowledge interpretation and understanding. In the past, for classification, this problem is solved as a discrete data problem by first discretizing the continuous data based on the class-attribute interdependence relationship. However, so far no proper solution exists when class information is unavailable. Hence, important pattern post-processing tasks such as pattern clustering and summarization cannot be applied to mixed-mode data. This paper presents a new method for solving the problem. It is based on two essential concepts. (1) Though class information is absent, yet for a correlated dataset, the attribute with the strongest interdependence with others in the group can be used to drive the discretization of the continuous data. (2) For a large database, correlated attribute groups must first be obtained by attribute clustering before (1) can be applied. Based on (1) and (2), pattern discovery methods are developed for mixed-mode data. Extensive experiments using synthetic and real world data were conducted to validate the usefulness and effectiveness of the proposed method.