Evaluating text categorization

Authors:
David D. Lewis
Affiliations:
-
Venue:
HLT '91 Proceedings of the workshop on Speech and Natural Language
Year:
1991

Citing 9
Cited 69

Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS)

Proc. of the third joint BCS and ACM symposium on Research and development in information retrieval
Another look at automatic text-retrieval systems

Communications of the ACM
How evaluation guides AI research

AI Magazine
SCISOR: extracting information from on-line news

Communications of the ACM
Plans for a task-oriented evaluation of natural language understanding systems

HLT '89 Proceedings of the workshop on Speech and Natural Language
Evaluating natural language generated database records

HLT '90 Proceedings of the workshop on Speech and Natural Language
Automatic Indexing: An Experimental Inquiry

Journal of the ACM (JACM)
Automatic Document Classification Part II . Additional Experiments

Journal of the ACM (JACM)
A news story categorization system

ANLC '88 Proceedings of the second conference on Applied natural language processing

An evaluation of phrasal and clustered representations on a text categorization task

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Noise reduction in a statistical approach to text categorization

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Partial orders for document representation: a new methodology for combining document features

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Recommendation as classification: using social and content-based information in recommendation

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Concept-based knowledge discovery in texts extracted from the Web

ACM SIGKDD Explorations Newsletter
Summarization as feature selection for text categorization

Proceedings of the tenth international conference on Information and knowledge management
Summarizing scientific articles: experiments with relevance and rhetorical status

Computational Linguistics - Summarization
Second Order Features for Maximising Text Classification Performance

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Mining Knowledge from Text Collections Using Automatically Generated Metadata

PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimal Queries in Information Filtering

ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Feature Reduction for Neural Network Based Text Categorization

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
TWIMC: An Anonymous Recipient E-mail System

IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
Empirical studies in discourse

Computational Linguistics
Automatic rule induction for unknown-word guessing

Computational Linguistics
Exploiting sophisticated representations for document retrieval

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Unsupervised learning of part-of-speech guessing rules

Natural Language Engineering
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Restrictive clustering and metaclustering for self-organizing document collections

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
TopCat: Data Mining for Topic Identification in a Text Corpus

IEEE Transactions on Knowledge and Data Engineering
Data extraction as text categorization: an experiment with the MUC-3 corpus

MUC3 '91 Proceedings of the 3rd conference on Message understanding
Feature selection and feature extraction for text categorization

HLT '91 Proceedings of the workshop on Speech and Natural Language
Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning

IEEE Transactions on Knowledge and Data Engineering
Corpus-based Learning of Analogies and Semantic Relations

Machine Learning
A methodology for clustering XML documents by structure

Information Systems
Gleaner: Creating ensembles of first-order clauses to improve recall-precision curves

Machine Learning
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Similarity of Semantic Relations

Computational Linguistics
Bilingual topic aspect classification with a few training examples

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Meta methods for model sharing in personal information systems

ACM Transactions on Information Systems (TOIS)
Towards the Automatic Construction of Conceptual Taxonomies

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Classification techniques with minimal labelling effort and application to medical reports

International Journal of Data Mining and Bioinformatics
External validation measures for K-means clustering: A data distribution perspective

Expert Systems with Applications: An International Journal
Incremental data-driven learning of a novelty detection model for one-class classification with application to high-dimensional noisy data

Machine Learning
A survey on session detection methods in query logs and a proposal for future evaluation

Information Sciences: an International Journal
A comparison of text-classification techniques applied to Arabic text

Journal of the American Society for Information Science and Technology
Automatic classification of citation function

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
SemEval-2007 task 04: classification of semantic relations between nominals

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Measuring semantic similarity by latent relational analysis

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A case-based reasoning system for PCB defect prediction

Expert Systems with Applications: An International Journal
A methodology for clustering XML documents by structure

Information Systems
The creation and evaluation of iSPARQL strategies for matchmaking

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Multi-label boosting for image annotation by structural grouping sparsity

Proceedings of the international conference on Multimedia
Two-level hierarchical combination method for text classification

Expert Systems with Applications: An International Journal
Image annotation by sparse logistic regression

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Using web sources for improving video categorization

Journal of Intelligent Information Systems
Let web spammers expose themselves

Proceedings of the fourth ACM international conference on Web search and data mining
A perceptron-like linear supervised algorithm for text classification

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
A new feature selection method based on support vector machines for text categorisation

International Journal of Data Analysis Techniques and Strategies
The effect of noise in automatic text classification

Proceedings of the International Conference & Workshop on Emerging Trends in Technology
A new feature selection algorithm based on binomial hypothesis testing for spam filtering

Knowledge-Based Systems
Sentiment analysis of citations using sentence structure-based features

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Social tags for resource discovery: a comparison between machine learning and user-centric approaches

Journal of Information Science
Selection strategies for multi-label text categorization

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Comparison of documents classification techniques to classify medical reports

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Markov blankets and meta-heuristics search: sentiment extraction from unstructured texts

WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Automated retraining methods for document classification and their parameter tuning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Using restrictive classification and meta classification for junk elimination

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
On benchmarking of invoice analysis systems

DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Automatic document organization in a p2p environment

ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Generating web-based corpora for video transcripts categorization

Expert Systems with Applications: An International Journal
Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards effective tutorial feedback for explanation questions: a dataset and baselines

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Detection of implicit citations for sentiment detection

ACL '12 Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
A document is known by the company it keeps: neighborhood consensus for short text categorization

Language Resources and Evaluation
Image classification with manifold learning for out-of-sample data

Signal Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This has made judging the relative merits of techniques for text categorization difficult and has disguised important research issues.In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluating either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods.