A discriminative candidate generator for string transformations

  • Authors:
  • Naoaki Okazaki;Yoshimasa Tsuruoka;Sophia Ananiadou;Jun'ichi Tsujii

  • Affiliations:
  • University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan;University of Manchester, Manchester Interdisciplinary Biocentre, Manchester, UK;University of Manchester, Manchester Interdisciplinary Biocentre, Manchester, UK;University of Tokyo, Hongo, Bunkyo-ku, Tokyo, Japan and University of Manchester, Manchester Interdisciplinary Biocentre, Manchester, UK

  • Venue:
  • EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

String transformation, which maps a source string s into its desirable form t*, is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.