Query-based text normalization selection models for enhanced retrieval accuracy

  • Authors:
  • Si-Chi Chin;Rhonda DeCook;W. Nick Street;David Eichmann

  • Affiliations:
  • The University of Iowa, Iowa City;The University of Iowa, Iowa City;The University of Iowa, Iowa City;The University of Iowa, Iowa City

  • Venue:
  • SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text normalization transforms words into a base form so that terms from common equivalent classes match. Traditionally, information retrieval systems employ stemming techniques to remove derivational affixes. Depluralization, the transformation of plurals into singular forms, is also used as a low-level text normalization technique to preserve more precise lexical semantics of text. Experiment results suggest that the choice of text normalization technique should be made individually on each topic to enhance information retrieval accuracy. This paper proposes a hybrid approach, constructing a query-based selection model to select the appropriate text normalization technique (stemming, depluralization, or not doing any text normalization). The selection model utilized ambiguity properties extracted from queries to train a composite of Support Vector Regression (SVR) models to predict a text normalization technique that yields the highest Mean Average Precision (MAP). Based on our study, such a selection model holds promise in improving retrieval accuracy.