OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Indexing: An Experimental Inquiry
Journal of the ACM (JACM)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Classifying text documents by associating terms with text categories
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
A refinement approach to handling model misfit in text categorization
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Two Odds-Radio-Based Text Classification Algorithms
WISEW '02 Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops) - (WISEw'02)
Text Document Categorization by Term Association
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Multiclass text categorization for automated survey coding
Proceedings of the 2003 ACM symposium on Applied computing
Best terms: an efficient feature-selection algorithm for text categorization
Knowledge and Information Systems
A Probabilistic Approach to Feature Selection for Multi-class Text Categorization
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
Statistical Identification of Key Phrases for Text Classification
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Multi-label text categorization using k-nearest neighbor approach with m-similarity
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Hybrid DIAAF/RS: statistical textual feature selection for language-independent text classification
ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Optimizing queries to remote resources
Journal of Intelligent Information Systems
Hi-index | 0.00 |
Many text mining applications, especially when investigating Text Classification (TC), require experiments to be performed using common text-collections, such that results can be compared with alternative approaches. With regard to single-label TC, most text-collections (textual data-sources) in their original form have at least one of the following limitations: the overall volume of textual data is too large for ease of experimentation; there are many predefined classes; most of the classes consist of only a very few documents; some documents are labeled with a single class whereas others have multiple classes; and there are documents found with little or no actual text-content. In this paper, we propose a standard approach to automatically extract "qualified" document-bases from a given textual data-source that can be used more effectively and reliably in single-label TC experiments. The experimental results demonstrate that document-bases extracted based on our approach can be used effectively in single-label TC experiments.