Little words can make a big difference for text classification
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Viewing morphology as an inference process
Artificial Intelligence - Special issue on Intelligent internet systems
A tutorial on support vector regression
Statistics and Computing
ACM SIGIR Forum
Don't have a stemmer?: be un+concern+ed
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Text normalization transforms words into a base form so that terms from common equivalent classes match. Traditionally, information retrieval systems employ stemming techniques to remove derivational affixes. Depluralization, the transformation of plurals into singular forms, is also used as a low-level text normalization technique to preserve more precise lexical semantics of text. Experiment results suggest that the choice of text normalization technique should be made individually on each topic to enhance information retrieval accuracy. This paper proposes a hybrid approach, constructing a query-based selection model to select the appropriate text normalization technique (stemming, depluralization, or not doing any text normalization). The selection model utilized ambiguity properties extracted from queries to train a composite of Support Vector Regression (SVR) models to predict a text normalization technique that yields the highest Mean Average Precision (MAP). Based on our study, such a selection model holds promise in improving retrieval accuracy.