Learning an optimal distance metric in a linguistic vector space

  • Authors:
  • Daichi Mochihashi;Genichiro Kikui;Kenji Kita

  • Affiliations:
  • ATR Spoken Language Communication Research Laboratories, Kyoto, 619-0288 Japan and Graduate School of Information Science, NAIST, Nara, 630-0192 Japan;ATR Spoken Language Communication Research Laboratories, Kyoto, 619-0288 Japan;ATR Spoken Language Communication Research Laboratories, Kyoto, 619-0288 Japan and Center for Advanced Information Technology, Tokushima University, Tokushima, 770-8506 Japan

  • Venue:
  • Systems and Computers in Japan
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Much natural language processing still depends on the Euclidean distance function between the two feature vectors, but the Euclidean distance suffers from severe defects as to feature weightings and feature correlations. In this paper we propose an optimal metric distance function that can be used as an alternative to the Euclidean distance, accommodating the two problems at the same time. This metric is optimal in the sense of global quadratic minimization, and can be obtained from the clusters in the training data in a supervised fashion.We have confirmed the effect of the proposed metric by the sentence retrieval, document retrieval, and K-means clustering of general vectorial data. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(9): 12–21, 2006; Published online in Wiley InterScience (). DOI 10.1002/scj.20533