Gene expression data clustering and visualization based on a binary hierarchical clustering framework

Authors:
Lap Keung Szeto;Alan Wee-Chung Liew;Hong Yan;Sy-sen Tang
Affiliations:
Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong and School of Electrical and Information Engineer University of ...;Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong
Venue:
APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
Year:
2003

Citing 1
Cited 1

Algorithms for clustering data

Algorithms for clustering data

A Computational Approach to Gene Expression Data Extraction and Analysis

Journal of VLSI Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the use of a binary hierarchical clustering (BHC) framework for clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the K-means algorithm is used to split the data into two classes. Secondly, the Fisher criterion is applied to the classes to assess whether the splitting is acceptable. The algorithm is applied to the subclasses recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation. We show that by arranging the BHC clustered gene expression data in a tree structure, we can easily visualize the cluster results. In addition, the tree structure display allows user judgement in finalizing the clustering result using prior biological knowledge.