Learning a distance metric for object identification without human supervision

Authors:
Satoshi Oyama;Katsumi Tanaka
Affiliations:
Department of Social Informatics, Graduate School of Informatics, Kyoto University, Kyoto, Japan;Department of Social Informatics, Graduate School of Informatics, Kyoto University, Kyoto, Japan
Venue:
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Year:
2006

Citing 9
Cited 2

Algorithms for clustering data

Algorithms for clustering data
Semidefinite programming

SIAM Review
Making large-scale support vector machine learning practical

Advances in kernel methods
Large Margin Classification Using the Perceptron Algorithm

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Modern Information Retrieval

Modern Information Retrieval
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online and batch learning of pseudo-metrics

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Identification of time-varying objects on the web

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Absolute and relative clustering

Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A method is described for learning a distance metric for use in object identification that does not require human supervision. It is based on two assumptions. One is that pairs of different names refer to different objects. The other is that names are arbitrary. These two assumptions justify using pairs of data items for objects with different names as “cannot-be-linked” example pairs for learning a distance metric for use in clustering ambiguous names. The metric learning is formulated using only dissimilar example pairs as a convex quadratic programming problem that can be solved much faster than a semi-definite programming problem, which generally must be solved to learn a distance metric matrix. Experiments on author identification using a bibliographic database showed that the learned metric improves identification F-measure.