Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

Authors:
Wai-Ho Au;Keith C. C. Chan;Andrew K. C. Wong;Yang Wang
Affiliations:
-;-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2005

Citing 36
Cited 25

APACS: a system for the automatic analysis and classification of conceptual patterns

Computational Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Mining fuzzy association rules

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Data clustering: a review

ACM Computing Surveys (CSUR)
Tissue classification with gene expression profiles

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Using Bayesian networks to analyze expression data

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Mining fuzzy association rules in a database containing relational and transactional data

Data mining and computational intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Self-Organizing Maps

Self-Organizing Maps
An Information Theoretic Approach to Rule Induction from Databases

IEEE Transactions on Knowledge and Data Engineering
High-Order Pattern Discovery from Discrete-Valued Data

IEEE Transactions on Knowledge and Data Engineering
Feature selection for high-dimensional genomic microarray data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Classification with Degree of Membership: A Fuzzy Approach

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
An Interval Classifier for Database Mining Applications

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Cancer classification using gene expression data

Information Systems - Special issue: Data management in bioinformatics
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Capturing best practice for microarray gene expression data analysis

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Gene ranking using bootstrapped P-values

ACM SIGKDD Explorations Newsletter
Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n)

ACM SIGKDD Explorations Newsletter
Redundancy based feature selection for microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Multiple pattern associations for interpreting structural and functional characteristics of biomolecules

Information Sciences—Informatics and Computer Science: An International Journal
A global optimal algorithm for class-dependent discretization of continuous data

Intelligent Data Analysis
Pattern discovery: a data driven approach to decision support

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
A novel evolutionary data mining algorithm with applications to churn prediction

IEEE Transactions on Evolutionary Computation
Fuzzy decision trees: issues and methods

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fuzzy association rules: general model and applications

IEEE Transactions on Fuzzy Systems
Mining fuzzy association rules in a bank-account database

IEEE Transactions on Fuzzy Systems

Evolutionary biclustering of gene expressions

Ubiquity
Correction to "Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data"

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Using association patterns for discrete-valed data clustering

AIAP'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications
Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stable feature selection via dense feature groups

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Dimensionality reduction for heterogeneous dataset in rushes editing

Pattern Recognition
Spanning Tree Based Attribute Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Cluster-Based Feature Selection Approach

HAIS '09 Proceedings of the 4th International Conference on Hybrid Artificial Intelligence Systems
A statistical approach for selecting discriminative features of spatial regions of interest

Intelligent Data Analysis
Ensemble gene selection by grouping for microarray data classification

Journal of Biomedical Informatics
Ensemble gene selection for cancer classification

Pattern Recognition
Efficient gene selection with rough sets from gene expression data

RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
Mining microarray gene expression data with unsupervised possibilistic clustering and proximity graphs

Applied Intelligence
Pattern discovery for large mixed-mode database

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Review Article: Stable feature selection for biomarker discovery

Computational Biology and Chemistry
Gene selection based on mutual information for the classification of multi-class cancer

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Gene selection by cooperative competition clustering

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
An unsupervised feature selection framework based on clustering

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Summarizing categorical data by clustering attributes

Data Mining and Knowledge Discovery
MicroClAn: Microarray clustering analysis

Journal of Parallel and Distributed Computing
Efficient Retrieval Technique for Microarray Gene Expression

International Journal of Information Retrieval Research
Analysing microarray expression data through effective clustering

Information Sciences: an International Journal
Fuzzy clustering with biological knowledge for gene selection

Applied Soft Computing
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal
MaskedPainter: Feature selection for microarray data analysis

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an attribute clustering method which is able to group genes based on their interdependence so as to mine meaningful patterns from the gene expression data. It can be used for gene grouping, selection, and classification. The partitioning of a relational table into attribute subgroups allows a small number of attributes within or across the groups to be selected for analysis. By clustering attributes, the search dimension of a data mining algorithm is reduced. The reduction of search dimension is especially important to data mining in gene expression data because such data typically consist of a huge number of genes (attributes) and a small number of gene expression profiles (tuples). Most data mining algorithms are typically developed and optimized to scale to the number of tuples instead of the number of attributes. The situation becomes even worse when the number of attributes overwhelms the number of tuples, in which case, the likelihood of reporting patterns that are actually irrelevant due to chances becomes rather high. It is for the aforementioned reasons that gene grouping and selection are important preprocessing steps for many data mining algorithms to be effective when applied to gene expression data. This paper defines the problem of attribute clustering and introduces a methodology to solving it. Our proposed method groups interdependent attributes into clusters by optimizing a criterion function derived from an information measure that reflects the interdependence between attributes. By applying our algorithm to gene expression data, meaningful clusters of genes are discovered. The grouping of genes based on attribute interdependence within group helps to capture different aspects of gene association patterns in each group. Significant genes selected from each group then contain useful information for gene expression classification and identification. To evaluate the performance of the proposed approach, we applied it to two well-known gene expression data sets and compared our results with those obtained by other methods. Our experiments show that the proposed method is able to find the meaningful clusters of genes. By selecting a subset of genes which have high multiple-interdependence with others within clusters, significant classification information can be obtained. Thus, a small pool of selected genes can be used to build classifiers with very high classification rate. From the pool, gene expressions of different categories can be identified.