OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Training algorithms for linear text classifiers
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Using LSI for text classification in the presence of background text
Proceedings of the tenth international conference on Information and knowledge management
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Integrating Background Knowledge into Nearest-Neighbor Text Classification
ECCBR '02 Proceedings of the 6th European Conference on Advances in Case-Based Reasoning
Hi-index | 0.00 |
Automated text categorization consists of developing computer programs able to autonomously assign texts to predefined categories, on the basis of their content Such applications are possible thanks to supervised learning, which implies a training on manually labeled documents During this phase, the system discovers links between relevant terms (the vocabulary) and identified categories However, the construction of a training set is long and expensive This paper suggests a way to assist text classifiers in the gathering of the vocabulary when the number of examples is limited, in which case the success rate is not at its best It proposes to analyze word cooccurrence within a collection of non-labeled documents in order to augment the vocabulary used by the classifier The representation of new documents to classify would benefit from this augmented vocabulary What is expected is an improvement of the classifier's success rate despite its limited training set.