Representation quality in text classification: an introduction and experiment

Authors:
David D. Lewis
Affiliations:
-
Venue:
HLT '90 Proceedings of the workshop on Speech and Natural Language
Year:
1990

Citing 10
Cited 2

Another look at automatic text-retrieval systems

Communications of the ACM
Coefficients of combining concept classes in a collection

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
A cluster-based approach to thesaurus construction

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
The automatic indexing system AIR/PHYS - from research to applications

SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Inference networks for document retrieval

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments with query acquisition and use in document retrieval systems

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Term clustering of syntactic phrases

SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval

Representation and learning in information retrieval
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
A news story categorization system

ANLC '88 Proceedings of the second conference on Applied natural language processing

An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Text classification improved through multigram models

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The way in which text is represented has a strong impact on the performance of text classification (retrieval and categorization) systems. We discuss the operation of text classification systems, introduce a theoretical model of how text representation impacts their performance, and describe how the performance of text classification systems is evaluated. We then present the results of an experiment on improving text representation quality, as well as an analysis of the results and the directions they suggest for future research.