Computing sharp bounds for hard clustering problems on trees

Authors:
Isabella Lari;Maurizio Maravalle;Bruno Simeone
Affiliations:
Department of Statistics, "La Sapienza" University, Piazzale A. Moro 5, 00185 Rome, Italy;Department of Systems and Institutions for Economics, University of L'Aquila, Piazza del Santuario 19, 67040 Roio Poggio, L'Aquila, Italy;Department of Statistics, "La Sapienza" University, Piazzale A. Moro 5, 00185 Rome, Italy
Venue:
Discrete Applied Mathematics
Year:
2009

Citing 9
Cited 1

A new algorithm for minimizing convex functions over convex sets

Mathematical Programming: Series A and B
Clustering on trees

Computational Statistics & Data Analysis
A clustering algorithm for hierarchical structures

ACM Transactions on Database Systems (TODS)
Multivariate Descriptive Statistical Analysis

Multivariate Descriptive Statistical Analysis
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Generalized Bundle Methods

SIAM Journal on Optimization
Maximum Split Clustering Under Connectivity Constraints

Journal of Classification
Bicriterion Cluster Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Contextual Template Matching: A Distance Measure for Patterns with Hierarchically Dependent Features

IEEE Transactions on Pattern Analysis and Machine Intelligence

On the complexity of isoperimetric problems on trees

Discrete Applied Mathematics

Quantified Score

Hi-index	0.04

Visualization

Abstract

Clustering problems with relational constraints in which the underlying graph is a tree arise in a variety of applications: hierarchical data base paging, communication and distribution networks, districting, biological taxonomy, and others. They are formulated here as optimal tree partitioning problems. In a previous paper, it was shown that their computational complexity strongly depends on the nature of the objective function and, in particular, that minimizing the total within-cluster dissimilarity or the diameter is computationally hard. We propose heuristics that find good partitions within a reasonable time, even for instances of relatively large size. Such heuristics are based on the solution of continuous relaxations of certain integer (or almost integer) linear programs. Experimental results on over 2000 randomly generated instances with up to 500 entities show that the values (total within-cluster dissimilarity or diameter) of the solutions provided by these heuristics are quite close to the minimum one.