NLP-driven IR: evaluating performances over a text classification task

  • Authors:
  • Roberto Basili;Alessandro Moschitti;Maria Teresa Pazienza

  • Affiliations:
  • University of Rome Tor Vergata, Department of Computer Science, Systems and Production, Roma, Italy;University of Rome Tor Vergata, Department of Computer Science, Systems and Production, Roma, Italy;University of Rome Tor Vergata, Department of Computer Science, Systems and Production, Roma, Italy

  • Venue:
  • IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although several attempts have been made to introduce Natural Language Processing (NLP) techniques in Information Retrieval, most ones failed to prove their effectiveness in increasing performances. In this paper Text Classification (TC) has been taken as the IR task and the effect of linguistic capabilities of the underlying system have been studied. A novel model for TC, extending a well know statistical model (i.e. Rocchio's formula [Ittner et al., 1995]) and applied to linguistic features has been defined and experimented. The proposed model represents an effective feature selection methodology. All the experiments result in a significant improvement with respect to other purely statistical methods (e.g. [Yang, 1999]), thus stressing the relevance of the available linguistic information. Moreover, the derived classifier reachs the performance (about 85%) of the best known models (i.e. Support Vector Machines (SVM) and K -Nearest Neighbour (KNN)) characterized by an higher computational complexity for training and processing.