Referential hierarchical clustering algorithm based upon principal component analysis and genetic algorithm

  • Authors:
  • Jui-Shin Lin;Shiaw-Wen Tien;Tung-Shou Chen;Yung-Hung Kao;Chih-Chiang Lin;Yung-Hsing Chiu

  • Affiliations:
  • Graduate School of Technology Management, Graduate School of Computer Science and Information Technology, Chung Hua University, National Taichung Institute of Technology, Hsinchu, Taiwan;Graduate School of Technology Management, Graduate School of Computer Science and Information Technology, Chung Hua University, National Taichung Institute of Technology, Hsinchu, Taiwan;Graduate School of Technology Management, Graduate School of Computer Science and Information Technology, Chung Hua University, National Taichung Institute of Technology, Hsinchu, Taiwan;Graduate School of Technology Management, Graduate School of Computer Science and Information Technology, Chung Hua University, National Taichung Institute of Technology, Hsinchu, Taiwan;Graduate School of Technology Management, Graduate School of Computer Science and Information Technology, Chung Hua University, National Taichung Institute of Technology, Hsinchu, Taiwan;Graduate School of Technology Management, Graduate School of Computer Science and Information Technology, Chung Hua University, National Taichung Institute of Technology, Hsinchu, Taiwan

  • Venue:
  • ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hierarchical Clustering (HC) is not designed to locate the leaf nodes in the tree structure, and therefore is not suitable to locate similarity relation on the sequence of the leaf nodes. In order to generate the similarity relation on tree structure diagram of HC, we proposed an improved solution in this paper; Referential Hierarchical clustering Algorithm (RHA). RHA is a combination of HC, Genetic Algorithm (GA) and Principal Component Analysis (PCA) to resolve the problem of traditional HC. PCA is a technique that reduces high-dimensional dataset to lower dimensions for analysis and reconstructs each data by a suitable linear combination of the principal components. These principal components are ordered by the amount of the variance which is explained in the original dataset. Therefore, RHA adopts GA to find the solution which has the same tree structure with HC and the most similar with the sequence of the samples sorted by increasing value of the first principle components. Experimental results show that the clustering result of RHA exposes the similarity relations between the leaf nodes and the clusters. RHA could be applied to any problem in HC and the generated tree diagram could assist researchers to compare and analyze each sample and find the relations between the clusters more easily and quickly.