NLP-driven IR: evaluating performances over a text classification task

Authors:
Roberto Basili;Alessandro Moschitti;Maria Teresa Pazienza
Affiliations:
University of Rome Tor Vergata, Department of Computer Science, Systems and Production, Roma, Italy;University of Rome Tor Vergata, Department of Computer Science, Systems and Production, Roma, Italy;University of Rome Tor Vergata, Department of Computer Science, Systems and Production, Roma, Italy
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Year:
2001

Citing 12
Cited 4

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Training algorithms for linear text classifiers

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Induction of Decision Trees

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology

Ontology-based metadata generation from semi-structured information

Proceedings of the 1st international conference on Knowledge capture
A Hybrid Approach to Optimize Feature Selection Process in Text Classification

AI*IA 01 Proceedings of the 7th Congress of the Italian Association for Artificial Intelligence on Advances in Artificial Intelligence
Supervised document classification based upon domain-specific term taxonomies

International Journal of Metadata, Semantics and Ontologies
RitroveRAI: a web application for semantic indexing and hyperlinking of multimedia news

ISWC'05 Proceedings of the 4th international conference on The Semantic Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although several attempts have been made to introduce Natural Language Processing (NLP) techniques in Information Retrieval, most ones failed to prove their effectiveness in increasing performances. In this paper Text Classification (TC) has been taken as the IR task and the effect of linguistic capabilities of the underlying system have been studied. A novel model for TC, extending a well know statistical model (i.e. Rocchio's formula [Ittner et al., 1995]) and applied to linguistic features has been defined and experimented. The proposed model represents an effective feature selection methodology. All the experiments result in a significant improvement with respect to other purely statistical methods (e.g. [Yang, 1999]), thus stressing the relevance of the available linguistic information. Moreover, the derived classifier reachs the performance (about 85%) of the best known models (i.e. Support Vector Machines (SVM) and K -Nearest Neighbour (KNN)) characterized by an higher computational complexity for training and processing.