A Fast Approximation Algorithm for the k Partition-Distance Problem

Authors:
Yen Hung Chen
Affiliations:
Department of Information Science and Management Systems, National Taitung University, Taitung, Taiwan, R.O.C. 950
Venue:
ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
Year:
2009

Citing 13
Cited 0

Combinatorial optimization

Combinatorial optimization
Data clustering: a review

ACM Computing Surveys (CSUR)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Introduction to Algorithms

Introduction to Algorithms
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Partition-distance via the assignment problem

Bioinformatics
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Modified SIMPSON O(n3) algorithm for the full sibship reconstruction problem

Bioinformatics
Comparing clusterings---an information based distance

Journal of Multivariate Analysis
Modified global k-means algorithm for clustering in gene expression data sets

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
On constructing an optimal consensus clustering from multiple clusterings

Information Processing Letters
Graph-based consensus clustering for class discovery from gene expression data

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set of elements N , a partition consists on dividing the set of elements into two or more disjoint clusters that cover all elements. A cluster contains a non-empty subset of elements. The number of clusters of a partition is less than or equal to |N |. Different partitioning algorithms for the same application will produce different partitions from the same set of elements. To compute the distance and find the consensus partition (also called as consensus clustering) between two or more partitions are important and interesting problems that arise in many applications such as bioinformatics and data mining. However, different distance functions between two or more partitions will usually need to be computed by different algorithms. In this paper, we discuss the k partition-distance problem which can be applied in bioinformatics. Given a set of elements N with k partitions, the k partition-distance problem is to delete the minimum number of elements from each partition such that all remaining partitions become identical. However, this problem has been shown to be NP-complete when k 2. We will present the first known approximation algorithm with performance ratio 2 to solve this problem in $O(k*\rho*|N|)$ time, where ρ is the maximum number of clusters of these k partitions. Then we perform our algorithm for simulation of random data and actual the set of organisms based on DNA markers. It will show that our approximation solution is at most twice the partition-distance of the optimal solution in practice.