Retrieval test evaluation of a rule based automatic indexing (AIR/PHYS)
Proc. of the third joint BCS and ACM symposium on Research and development in information retrieval
Another look at automatic text-retrieval systems
Communications of the ACM
How evaluation guides AI research
AI Magazine
SCISOR: extracting information from on-line news
Communications of the ACM
Plans for a task-oriented evaluation of natural language understanding systems
HLT '89 Proceedings of the workshop on Speech and Natural Language
Evaluating natural language generated database records
HLT '90 Proceedings of the workshop on Speech and Natural Language
Automatic Indexing: An Experimental Inquiry
Journal of the ACM (JACM)
Automatic Document Classification Part II . Additional Experiments
Journal of the ACM (JACM)
A news story categorization system
ANLC '88 Proceedings of the second conference on Applied natural language processing
An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
An example-based mapping method for text categorization and retrieval
ACM Transactions on Information Systems (TOIS)
Noise reduction in a statistical approach to text categorization
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Partial orders for document representation: a new methodology for combining document features
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Recommendation as classification: using social and content-based information in recommendation
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Concept-based knowledge discovery in texts extracted from the Web
ACM SIGKDD Explorations Newsletter
Summarization as feature selection for text categorization
Proceedings of the tenth international conference on Information and knowledge management
Summarizing scientific articles: experiments with relevance and rhetorical status
Computational Linguistics - Summarization
Second Order Features for Maximising Text Classification Performance
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Mining Knowledge from Text Collections Using Automatically Generated Metadata
PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimal Queries in Information Filtering
ISMIS '00 Proceedings of the 12th International Symposium on Foundations of Intelligent Systems
Feature Reduction for Neural Network Based Text Categorization
DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
TWIMC: An Anonymous Recipient E-mail System
IEA/AIE '02 Proceedings of the 15th international conference on Industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence
The VLDB Journal — The International Journal on Very Large Data Bases
Empirical studies in discourse
Computational Linguistics
Automatic rule induction for unknown-word guessing
Computational Linguistics
Exploiting sophisticated representations for document retrieval
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Unsupervised learning of part-of-speech guessing rules
Natural Language Engineering
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Restrictive clustering and metaclustering for self-organizing document collections
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
TopCat: Data Mining for Topic Identification in a Text Corpus
IEEE Transactions on Knowledge and Data Engineering
Data extraction as text categorization: an experiment with the MUC-3 corpus
MUC3 '91 Proceedings of the 3rd conference on Message understanding
Feature selection and feature extraction for text categorization
HLT '91 Proceedings of the workshop on Speech and Natural Language
IEEE Transactions on Knowledge and Data Engineering
Corpus-based Learning of Analogies and Semantic Relations
Machine Learning
A methodology for clustering XML documents by structure
Information Systems
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Similarity of Semantic Relations
Computational Linguistics
Bilingual topic aspect classification with a few training examples
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Meta methods for model sharing in personal information systems
ACM Transactions on Information Systems (TOIS)
Towards the Automatic Construction of Conceptual Taxonomies
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Classification techniques with minimal labelling effort and application to medical reports
International Journal of Data Mining and Bioinformatics
External validation measures for K-means clustering: A data distribution perspective
Expert Systems with Applications: An International Journal
A survey on session detection methods in query logs and a proposal for future evaluation
Information Sciences: an International Journal
A comparison of text-classification techniques applied to Arabic text
Journal of the American Society for Information Science and Technology
Automatic classification of citation function
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
SemEval-2007 task 04: classification of semantic relations between nominals
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Measuring semantic similarity by latent relational analysis
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A case-based reasoning system for PCB defect prediction
Expert Systems with Applications: An International Journal
A methodology for clustering XML documents by structure
Information Systems
The creation and evaluation of iSPARQL strategies for matchmaking
ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
Multi-label boosting for image annotation by structural grouping sparsity
Proceedings of the international conference on Multimedia
Two-level hierarchical combination method for text classification
Expert Systems with Applications: An International Journal
Image annotation by sparse logistic regression
PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Using web sources for improving video categorization
Journal of Intelligent Information Systems
Let web spammers expose themselves
Proceedings of the fourth ACM international conference on Web search and data mining
A perceptron-like linear supervised algorithm for text classification
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
A new feature selection method based on support vector machines for text categorisation
International Journal of Data Analysis Techniques and Strategies
The effect of noise in automatic text classification
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
A new feature selection algorithm based on binomial hypothesis testing for spam filtering
Knowledge-Based Systems
Sentiment analysis of citations using sentence structure-based features
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Journal of Information Science
Selection strategies for multi-label text categorization
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Comparison of documents classification techniques to classify medical reports
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Markov blankets and meta-heuristics search: sentiment extraction from unstructured texts
WebKDD'04 Proceedings of the 6th international conference on Knowledge Discovery on the Web: advances in Web Mining and Web Usage Analysis
Automated retraining methods for document classification and their parameter tuning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Using restrictive classification and meta classification for junk elimination
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
On benchmarking of invoice analysis systems
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Automatic document organization in a p2p environment
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Generating web-based corpora for video transcripts categorization
Expert Systems with Applications: An International Journal
Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Towards effective tutorial feedback for explanation questions: a dataset and baselines
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Detection of implicit citations for sentiment detection
ACL '12 Proceedings of the Workshop on Detecting Structure in Scholarly Discourse
A document is known by the company it keeps: neighborhood consensus for short text categorization
Language Resources and Evaluation
Image classification with manifold learning for out-of-sample data
Signal Processing
Hi-index | 0.01 |
While certain standard procedures are widely used for evaluating text retrieval systems and algorithms, the same is not true for text categorization. Omission of important data from reports is common and methods of measuring effectiveness vary widely. This has made judging the relative merits of techniques for text categorization difficult and has disguised important research issues.In this paper I discuss a variety of ways of evaluating the effectiveness of text categorization systems, drawing both on reported categorization experiments and on methods used in evaluating query-driven retrieval. I also consider the extent to which the same evaluation methods may be used with systems for text extraction, a more complex task. In evaluating either kind of system, the purpose for which the output is to be used is crucial in choosing appropriate evaluation methods.