A new variable---length genome genetic algorithm for data clustering in semeiotics
Proceedings of the 2005 ACM symposium on Applied computing
A new intrusion detection system using support vector machines and hierarchical clustering
The VLDB Journal — The International Journal on Very Large Data Bases
Techniques for clustering gene expression data
Computers in Biology and Medicine
Case-Sensitivity of Classifiers for WSD: Complex Systems Disambiguate Tough Words Better
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Indexing 3-D human motion repositories for content-based retrieval
IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Distance based feature selection for clustering microarray data
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Analyzing large image databases with the evolving tree
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Support vector machine classification based on fuzzy clustering for large data sets
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Defining classifier regions for WSD ensembles using word space features
MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Hierarchical indexing structure for 3d human motions
MMM'07 Proceedings of the 13th international conference on Multimedia Modeling - Volume Part I
Building an optimal WSD ensemble using per-word selection of best system
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Perspectives of self-adapted self-organizing clustering in organic computing
BioADIT'06 Proceedings of the Second international conference on Biologically Inspired Approaches to Advanced Information Technology
Hi-index | 3.84 |
Motivation: The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which cannot be reevaluated). In this paper, we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. Result: We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying dataset can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large datasets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data misclustered in the early stages to be reevaluated at a later stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression dataset, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. Availability: DGSOT is available upon request from the authors.