An evaluation of phrasal and clustered representations on a text categorization task
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Representation and learning in information retrieval
Representation and learning in information retrieval
Improving text retrieval for the routing problem using latent semantic indexing
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating and optimizing autonomous text classification systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Making large-scale support vector machine learning practical
Advances in kernel methods
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text classification using ESC-based stochastic decision lists
Proceedings of the eighth international conference on Information and knowledge management
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An improved boosting algorithm and its application to text categorization
Proceedings of the ninth international conference on Information and knowledge management
Text databases & document management
A study of thresholding strategies for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
A meta-learning approach for text categorization
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Text classification in a hierarchical mixture model for small training sets
Proceedings of the tenth international conference on Information and knowledge management
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Bayesian online classifiers for text classification and filtering
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
A new family of online algorithms for category ranking
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic combination of text classifiers using reliability indicators: models and results
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories
IAAI '90 Proceedings of the The Second Conference on Innovative Applications of Artificial Intelligence
Using asymmetric distributions to improve text classifier probability estimates
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
A maximal figure-of-merit learning approach to text categorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Support vector machine active learning with applications to text classification
The Journal of Machine Learning Research
Supervised term weighting for automated text categorization
Proceedings of the 2003 ACM symposium on Applied computing
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Discretizing continuous attributes in AdaBoost for text categorization
ECIR'03 Proceedings of the 25th European conference on IR research
Semi-supervised single-label text categorization using centroid-based classifiers
Proceedings of the 2007 ACM symposium on Applied computing
Proceedings of the 2007 ACM symposium on Applied computing
A study of local and global thresholding techniques in text categorization
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Evolving Lucene search queries for text classification
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Semantic mapping and K-means applied to hybrid SOM-based document organization system construction
Proceedings of the 2008 ACM symposium on Applied computing
Using Wavelets to Classify Documents
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Clinical text classification under the Open and Closed Topic Assumptions
International Journal of Data Mining and Bioinformatics
Immune Learning in a Dynamic Information Environment
ICARIS '09 Proceedings of the 8th International Conference on Artificial Immune Systems
Semi-supervised Text Classification Using RBF Networks
IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
An effective and robust method for short text classification
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Document clustering using unsupervised learning method: topology-preserving map
Proceedings of the International Conference and Workshop on Emerging Trends in Technology
On the relative hardness of clustering corpora
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Text categorization based on topic model
RSKT'08 Proceedings of the 3rd international conference on Rough sets and knowledge technology
A text categorization method based on local document frequency
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Analytical evaluation of term weighting schemes for text categorization
Pattern Recognition Letters
Semantic Space models for classification of consumer webpages on metadata attributes
Journal of Biomedical Informatics
Exploiting word cluster information for unsupervised feature selection
PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
A new nearest neighbor rule for text categorization
CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Expert Systems with Applications: An International Journal
On the assessment of text corpora
NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
A term association translation model for naive bayes text classification
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Learning to classify service data with latent semantics
RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Nonlinear transformation of term frequencies for term weighting in text categorization
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research on this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained on this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have “carved” different subsets out of this collection and tested their systems on one of these subsets only; systems that have been tested on different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested on these different subsets. © 2005 Wiley Periodicals, Inc.