On the parallel complexity of hierarchical clustering and CC-complete problems

  • Authors:
  • Raymond Greenlaw;Sanpawat Kantabutra

  • Affiliations:
  • Department of Computer Science, Armstrong Atlantic State University, Savannah, Georgia 31419-1997;The Theory of Computation Group, Computer Science Department, Chiang Mai University, Chiang Mai 50200, Thailand

  • Venue:
  • Complexity
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well-studied clustering method involving both top-down and bottom-up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top-down and bottom-up hierarchical clustering. The top-down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)-time, n2-processor Crew Pram algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom-up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC-complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUIT VALUE PROBLEM class. This class contains a variety of interesting problems, and now for the first time a problem from data mining as well. By proving that HCP is CC-complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top-down sequential approach, and the result surprisingly shows that the parallel complexities of the top-down and bottom-up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC-complete problems. © 2008 Wiley Periodicals, Inc. Complexity, 2008