A Fast Approximation Algorithm for the k Partition-Distance Problem

  • Authors:
  • Yen Hung Chen

  • Affiliations:
  • Department of Information Science and Management Systems, National Taitung University, Taitung, Taiwan, R.O.C. 950

  • Venue:
  • ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a set of elements N , a partition consists on dividing the set of elements into two or more disjoint clusters that cover all elements. A cluster contains a non-empty subset of elements. The number of clusters of a partition is less than or equal to |N |. Different partitioning algorithms for the same application will produce different partitions from the same set of elements. To compute the distance and find the consensus partition (also called as consensus clustering) between two or more partitions are important and interesting problems that arise in many applications such as bioinformatics and data mining. However, different distance functions between two or more partitions will usually need to be computed by different algorithms. In this paper, we discuss the k partition-distance problem which can be applied in bioinformatics. Given a set of elements N with k partitions, the k partition-distance problem is to delete the minimum number of elements from each partition such that all remaining partitions become identical. However, this problem has been shown to be NP-complete when k 2. We will present the first known approximation algorithm with performance ratio 2 to solve this problem in $O(k*\rho*|N|)$ time, where ρ is the maximum number of clusters of these k partitions. Then we perform our algorithm for simulation of random data and actual the set of organisms based on DNA markers. It will show that our approximation solution is at most twice the partition-distance of the optimal solution in practice.