Shortest-linkage-based parallel hierarchical clustering on main-belt moving objects of the solar system

Authors:
Cheng-Hsien Tang;Meng-Feng Tsai;Shan-Hao Chuang;Jen-Jung Cheng;Wei-Jen Wang
Affiliations:
-;-;-;-;-
Venue:
Future Generation Computer Systems
Year:
2014

Citing 17
Cited 0

Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor

Journal of Documentation
Models of incremental concept formation

Artificial Intelligence
Parallel Algorithms for Hierarchical Clustering and Cluster Validity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Parallel algorithms for hierarchical clustering

Parallel Computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On finding the number of clusters

Pattern Recognition Letters
Data clustering: a review

ACM Computing Surveys (CSUR)
Parallel algorithms for hierarchical clustering and applications to split decomposition and parity graph recognition

Journal of Algorithms
Efficient parallel algorithms for hierarchical clustering on arrays with reconfigurable optical buses

Journal of Parallel and Distributed Computing
Introduction to algorithms

Introduction to algorithms
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Hardware Implementation of PRAM and Its Performance Evaluation

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Parallelism in Knowledge Discovery Techniques

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Extended K-means with an Efficient Estimation of the Number of Clusters

IDEAL '00 Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, Data Mining, Financial Engineering, and Intelligent Agents
An Incremental Approach to Building a Cluster Hierarchy

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is an important data preparation process in many scientific analysis researches. In astronomy, although the distributed environments and modern observation techniques enable users to collect and access huge amounts of data, the corresponding clustering process may become very costly. One of the challenges is that the sequential clustering algorithms, that can be applied to cluster hundreds of thousand main-belt asteroids to reason about the origins of the main-belt asteroids, may not be used in the distributed environment directly. Therefore, this study focuses on the problem of parallelizing the traditional hierarchical agglomerative clustering algorithm using shortest-linkage. We propose a new parallel hierarchical agglomerative clustering algorithm based on the master-worker model. The master process divides the whole computation into several small tasks, and distributes the tasks to the worker processes for parallel processing. Then, the master process merges the results from the worker processes to form a hierarchical data structure. The proposed algorithm uses a pruning threshold to reduce the execution time and the storage requirement during the computation. It also supports fast incremental update that merges new data items into a constructed hierarchical tree in seconds, given a tree of about 550,000 data items. To evaluate the performance of our algorithm, this study has conducted several experiments using the MPCORB dataset and a dataset from the DVO database. The results confirm the efficiency of our proposed methodology. Compared with prior similar studies, the proposed algorithm is more flexible and practical in the problem of distributed hierarchical agglomerative clustering.