Boosting margin based distance functions for clustering

Authors:
Tomer Hertz;Aharon Bar-Hillel;Daphna Weinshall
Affiliations:
The Hebrew University of Jerusalem, Jerusalem, Israel;The Hebrew University of Jerusalem, Jerusalem, Israel;The Hebrew University of Jerusalem, Jerusalem, Israel
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 9
Cited 27

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Comparing images using color coherence vectors

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
Data clustering using a model granular magnet

Neural Computation
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Adjustment Learning and Relevant Component Analysis

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

Learning a kernel function for classification with small training samples

ICML '06 Proceedings of the 23rd international conference on Machine learning
Relaxational metric adaptation and its application to semi-supervised clustering and content-based image retrieval

Pattern Recognition
Kernel-based distance metric learning for content-based image retrieval

Image and Vision Computing
Learning distance function by coding similarity

Proceedings of the 24th international conference on Machine learning
BoostCluster: boosting clustering by pairwise constraints

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning a Mahalanobis distance metric for data clustering and classification

Pattern Recognition
Learning Similarity Measures from Pairwise Constraints with Neural Networks

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
A scalable kernel-based semisupervised metric learning algorithm with out-of-sample generalization ability

Neural Computation
Analysis of classification margin for classification accuracy with applications

Neurocomputing
Distributed and Incremental Clustering Based on Weighted Affinity Propagation

Proceedings of the 2008 conference on STAIRS 2008: Proceedings of the Fourth Starting AI Researchers' Symposium
A scalable kernel-based algorithm for semi-supervised metric learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Metric learning for semi-supervised clustering using pairwise constraints and the geometrical structure of data

Intelligent Data Analysis
Semi-supervised clustering using similarity neural networks

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Classification with positive and negative equivalence constraints: theory, computation and human experiments

BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
Image classification from small sample, with distance learning and feature selection

ISVC'07 Proceedings of the 3rd international conference on Advances in visual computing - Volume Part II
Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data

Pattern Recognition
Joint learning of labels and distance metric

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on game theory
Identifying Join Candidates in the Cairo Genizah

International Journal of Computer Vision
Learning from pairwise constraints by Similarity Neural Networks

Neural Networks
Kernel-Based metric adaptation with pairwise constraints

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
A new framework for dissimilarity and similarity learning

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Similarity boosting for label noise tolerance in protein-chemical interaction prediction

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Predicting protein-peptide binding affinity by learning peptide-peptide distance functions

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
LIMSI: learning semantic similarity by selecting random word subsets

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Pairwise support vector machines and their application to large scale problems

The Journal of Machine Learning Research
Active selection of clustering constraints: a sequential approach

Pattern Recognition
Learning bilinear model for matching queries and documents

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of graph based clustering methods critically depends on the quality of the distance function used to compute similarities between pairs of neighboring nodes. In this paper we learn distance functions by training binary classifiers with margins. The classifiers are defined over the product space of pairs of points and are trained to distinguish whether two points come from the same class or not. The signed margin is used as the distance value. Our main contribution is a distance learning method (DistBoost), which combines boosting hypotheses over the product space with a weak learner based on partitioning the original feature space. Each weak hypothesis is a Gaussian mixture model computed using a semi-supervised constrained EM algorithm, which is trained using both unlabeled and labeled data. We also consider SVM and decision trees boosting as margin based classifiers in the product space. We experimentally compare the margin based distance functions with other existing metric learning methods, and with existing techniques for the direct incorporation of constraints into various clustering algorithms. Clustering performance is measured on some benchmark databases from the UCI repository, a sample from the MNIST database, and a data set of color images of animals. In most cases the DistBoost algorithm significantly and robustly outperformed its competitors.