Cluster Analysis for Gene Expression Data: A Survey

Authors:
Daxin Jiang;Chun Tang;Aidong Zhang
Affiliations:
-;-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 20
Cited 122

Algorithms for clustering data

Algorithms for clustering data
Self-organization and associative memory: 3rd edition

Self-organization and associative memory: 3rd edition
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Mining the gene expression matrix: inferring gene relationships from large scale gene expression data

IPCAT '97 Proceedings of the second international workshop on Information processing in cell and tissues
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
A clustering algorithm based on graph connectivity

Information Processing Letters
Class discovery in gene expression data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Analysis of gene expression profiles: class discovery and leaf ordering

Proceedings of the sixth annual international conference on Computational biology
An iterative strategy for pattern discovery in high-dimensional data sets

Proceedings of the eleventh international conference on Information and knowledge management
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Interrelated Two-way Clustering: An Unsupervised Approach for Gene Expression Data Analysis

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Interactive exploration of coherent patterns in time-series gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining phenotypes and informative genes from gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation

Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Multi-Metric and Multi-Substructure Biclustering Analysis for Gene Expression Data

CSB '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference
ICA Based Supervised Gene Classification of Microarray Data in Yeast Functional Genome

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Gene Cluster Algorithm Based on Most Similarity Tree

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Evolutionary biclustering of gene expressions

Ubiquity
A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment

Pattern Recognition Letters
Additional limitations of the clustering validation method figure of merit

ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Dynamic agglomerative clustering of gene expression profiles

Pattern Recognition Letters
A fast and effective method to find correlations among attributes in databases

Data Mining and Knowledge Discovery
Identifying and discriminating between web and peer-to-peer traffic in the network core

Proceedings of the 16th international conference on World Wide Web
A clustering procedure for exploratory mining of vector time series

Pattern Recognition
Unsupervised minor prototype detection using an adaptive population partitioning algorithm

Pattern Recognition
Possibilistic approach for biclustering microarray data

Computers in Biology and Medicine
MMR: An algorithm for clustering categorical data using Rough Set Theory

Data & Knowledge Engineering
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A novel approach to revealing positive and negative co-regulated genes

Journal of Computer Science and Technology
High Confidence Rule Mining for Microarray Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A genetic approach for efficient outlier detection in projected space

Pattern Recognition
Novel Algorithm for Coexpression Detection in Time-Varying Microarray Data Sets

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
2008 Special Issue: Interactive data analysis and clustering of genomic data

Neural Networks
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
Mining images using clustering and data compressing techniques

International Journal of Information and Communication Technology
Focused local cluster formation for multidimensional microarray data

AEE'08 Proceedings of the 7th WSEAS International Conference on Application of Electrical Engineering
A Novel EPA-KNN Gene Classification Algorithm

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Clustering Microarray Data with Space Filling Curves

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Robust Clustering by Aggregation and Intersection Methods

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
A Fuzzy Extension of Some Classical Concordance Measures and an Efficient Algorithm for Their Computation

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Coclustering of Human Cancer Microarrays Using Minimum Sum-Squared Residue Coclustering

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Integration of K-means algorithm and AprioriSome algorithm for fuzzy sequential pattern mining

Applied Soft Computing
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Intelligent system for the analysis of microarray data using principal components and estimation of distribution algorithms

Expert Systems with Applications: An International Journal
On comparing two sequences of numbers and its applications to clustering analysis

Information Sciences: an International Journal
A new approach for clustering gene expression time series data

International Journal of Bioinformatics Research and Applications
Two FCA-Based Methods for Mining Gene Expression Data

ICFCA '09 Proceedings of the 7th International Conference on Formal Concept Analysis
Interactive Visualization Tools for Meta-Clustering

Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
An Evolutionary Hierarchical Clustering Method with a Visual Validation Tool

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part I: Bio-Inspired Systems: Computational and Ambient Intelligence
An Evolutionary Approach for Sample-Based Clustering on Microarray Data

IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Clustering of gene expression data based on shape similarity

EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics
Partition clustering of high dimensional low sample size data based on p-values

Computational Statistics & Data Analysis
Exploring ant-based algorithms for gene expression data analysis

Artificial Intelligence in Medicine
Improving k-NN for Human Cancer Classification Using the Gene Expression Profiles

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Discovering pattern-based subspace clusters by pattern tree

Knowledge-Based Systems
A stability-based algorithm to validate hierarchical clusters of genes

International Journal of Knowledge Engineering and Soft Data Paradigms
Concordance indices for comparing fuzzy, possibilistic, rough and grey partitions

International Journal of Knowledge Engineering and Soft Data Paradigms
A survey of evolutionary algorithms for clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Data mining of vector–item patterns using neighborhood histograms

Knowledge and Information Systems
A novel ensemble machine learning for robust microarray data classification

Computers in Biology and Medicine
Kernel Alignment k-NN for Human Cancer Classification Using the Gene Expression Profiles

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Improved Visual Clustering through Unsupervised Dimensionality Reduction

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Investigation of a new GRASP-based clustering algorithm applied to biological data

Computers and Operations Research
Ranking through integration of protein-similarity for identification of cell-cyclic genes

International Journal of Bioinformatics Research and Applications
Efficiently mining local conserved clusters from gene expression data

Neurocomputing
A method of tumor classification based on wavelet packet transforms and neighborhood rough set

Computers in Biology and Medicine
Boosting support vector machines using multiple dissimilarities

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Generalized external indexes for comparing data partitions with overlapping categories

Pattern Recognition Letters
On the combination of dissimilarities for gene expression data analysis

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Ensemble of dissimilarity based classifiers for cancerous samples classification

PRIB'07 Proceedings of the 2nd IAPR international conference on Pattern recognition in bioinformatics
FPF-SB: a scalable algorithm for microarray gene expression data clustering

ICDHM'07 Proceedings of the 1st international conference on Digital human modeling
Gene clustering by using query-based self-organizing maps

Expert Systems with Applications: An International Journal
A boltzmann theory based dynamic agglomerative hierarchical clustering

CIRA'09 Proceedings of the 8th IEEE international conference on Computational intelligence in robotics and automation
Application notes: data mining in cancer research

IEEE Computational Intelligence Magazine
An effective filtering gene selection method for microarray data via shuffling and statistical analysis

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

Pattern Recognition
Mining microarray gene expression data with unsupervised possibilistic clustering and proximity graphs

Applied Intelligence
Projection based clustering of gene expression data

CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
Comparing fuzzy, probabilistic, and possibilistic partitions

IEEE Transactions on Fuzzy Systems
Techniques for finding similarity knowledge in OLAP reports

Expert Systems with Applications: An International Journal
Bi-k-bi clustering: mining large scale gene expression data using two-level biclustering

International Journal of Data Mining and Bioinformatics
Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data

International Journal of Approximate Reasoning
WF-MSB: A weighted fuzzy-based biclustering method for gene expression data

International Journal of Data Mining and Bioinformatics
Mining gene expression data with pattern structures in formal concept analysis

Information Sciences: an International Journal
Searching for Coexpressed Genes in Three-Color cDNA Microarray Data Using a Probabilistic Model-Based Hough Transform

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A general stochastic clustering method for automatic cluster discovery

Pattern Recognition
Biclustering numerical data in formal concept analysis

ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
A systematic comparison of genome scale clustering algorithms

ISBRA'11 Proceedings of the 7th international conference on Bioinformatics research and applications
BARTMAP: A viable structure for biclustering

Neural Networks
An effective density-based hierarchical clustering technique to identify coherent patterns from gene expression data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Approximate kernel k-means: solution to large scale kernel clustering

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
EEW-SC: Enhanced Entropy-Weighting Subspace Clustering for high dimensional gene expression data clustering analysis

Applied Soft Computing
Mutual information criteria for feature selection

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition
Spectral clustering: A semi-supervised approach

Neurocomputing
An efficient unsupervised sample clustering for cancer datasets based on statistical model pre-processing

International Journal of Information Technology and Management
On combining fractal dimension with GA for feature subset selecting

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Subspace clustering of microarray data based on domain transformation

VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Effectivity of internal validation techniques for gene clustering

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
A distributed, parallel system for large-scale structure recognition in gene expression data

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
The practical method of fractal dimensionality reduction based on z-ordering technique

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Co-expression gene discovery from microarray for integrative systems biology

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Spectral clustering with discriminant cuts

Knowledge-Based Systems
Gene selection by cooperative competition clustering

ICIC'06 Proceedings of the 2006 international conference on Computational Intelligence and Bioinformatics - Volume Part III
Evaluation of the contents of partitions obtained with clustering gene expression data

BSB'05 Proceedings of the 2005 Brazilian conference on Advances in Bioinformatics and Computational Biology
Relevant gene selection using normalized cut clustering with maximal compression similarity measure

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Virtual gene: a gene selection algorithm for sample classification on microarray datasets

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Virtual gene: using correlations between genes to select informative genes on microarray datasets

Transactions on Computational Systems Biology II
An effective graph-based clustering technique to identify coherent patterns from gene expression data

International Journal of Bioinformatics Research and Applications
Boost feature subset selection: a new gene selection algorithm for microarray dataset

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Dynamic bayesian network modeling of cyanobacterial biological processes via gene clustering

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part I
An approach to find embedded clusters using density based techniques

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Integrating wavelets with clustering and indexing for effective content-based image retrieval

Knowledge-Based Systems
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Finding gene coherent patterns using PATSUB+

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Multi-level description of leaf index based on analysis of canopy structure

ICSI'12 Proceedings of the Third international conference on Advances in Swarm Intelligence - Volume Part II
Hypergraph based information-theoretic feature selection

Pattern Recognition Letters
A visual analytics framework for cluster analysis of DNA microarray data

Expert Systems with Applications: An International Journal
A generalized automatic clustering algorithm in a multiobjective framework

Applied Soft Computing
Comparing partitions by means of fuzzy data mining tools

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
MicroClAn: Microarray clustering analysis

Journal of Parallel and Distributed Computing
An evolutionary computational model applied to cluster analysis of DNA microarray data

Expert Systems with Applications: An International Journal
Comparing relational and non-relational algorithms for clustering propositional data

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Analysis of mixed C-means clustering approach for brain tumour gene expression data

International Journal of Data Analysis Techniques and Strategies
Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
MAR: Maximum Attribute Relative of soft set for clustering attribute selection

Knowledge-Based Systems
A new measure for gene expression biclustering based on non-parametric correlation

Computer Methods and Programs in Biomedicine
Gene expression data clustering using a multiobjective symmetry based clustering technique

Computers in Biology and Medicine
An automatic method to determine the number of clusters using decision-theoretic rough set

International Journal of Approximate Reasoning
Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Instability and cluster stability variance for real clusterings

Information Sciences: an International Journal
Reporting and analyzing alternative clustering solutions by employing multi-objective genetic algorithm and conducting experiments on cancer data

Knowledge-Based Systems
Fuzzy clustering with biological knowledge for gene selection

Applied Soft Computing
Hierarchical Social Network Analysis Using a Multi-Agent System: A School System Case

International Journal of Agent Technologies and Systems
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

DNA microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increases the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. In particular, we divide cluster analysis for gene expression data into three categories. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches. We also discuss the problem of cluster validation in three aspects and review various methods to assess the quality and reliability of clustering results. Finally, we conclude this paper and suggest the promising trends in this field.