Semisupervised learning from dissimilarity data

Authors:
Michael W. Trosset;Carey E. Priebe;Youngser Park;Michael I. Miller
Affiliations:
Department of Statistics, Indiana University, Bloomington, IN 47405, USA;Department of Applied Mathematics & Statistics, Johns Hopkins University, Baltimore, MD 21218, USA;Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218, USA;Center for Imaging Science, Johns Hopkins University, Baltimore, MD 21218, USA
Venue:
Computational Statistics & Data Analysis
Year:
2008

Citing 3
Cited 3

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Computing Large Deformation Metric Mappings via Geodesic Flows of Diffeomorphisms

International Journal of Computer Vision

Collaborative filtering via euclidean embedding

Proceedings of the fourth ACM conference on Recommender systems
Discriminative Topic Modeling Based on Manifold Learning

ACM Transactions on Knowledge Discovery from Data (TKDD)
Secure semi-supervised vector quantization for dissimilarity data

IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I

Quantified Score

Hi-index	0.03

Visualization

Abstract

The following two-stage approach to learning from dissimilarity data is described: (1) embed both labeled and unlabeled objects in a Euclidean space; then (2) train a classifier on the labeled objects. The use of linear discriminant analysis for (2), which naturally invites the use of classical multidimensional scaling for (1), is emphasized. The choice of the dimension of the Euclidean space in (1) is a model selection problem; too few or too many dimensions can degrade classifier performance. The question of how the inclusion of unlabeled objects in (1) affects classifier performance is investigated. In the case of spherical covariances, including unlabeled objects in (1) is demonstrably superior. Several examples are presented.