The nature of statistical learning theory
The nature of statistical learning theory
Computer Evaluation of Indexing and Text Processing
Journal of the ACM (JACM)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
A novel refinement approach for text categorization
Proceedings of the 14th ACM international conference on Information and knowledge management
Adaptable term weighting framework for text classification
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
A new term ranking method based on relation extraction and graph model for text classification
ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
Hi-index | 0.00 |
Text categorization is an important research field within text mining. A document, actually, is often full of class-independent "general" words which many documents and classes share. These "general" words do harm to text categorization rather than contribute to the task. Inspired by human cognitive procedure in text classification task, we propose a novel approach called Class Core Extraction (CCE) method to extract "core" terms from each class. The "core" terms, which include not only the single-words but also the combinations of words just like a simple description of context, must be those terms with strong distinguishing power. In testing phase, a suitable algorithm what we called "lottery" algorithm is also proposed, which use weighted matching strategy to make final categorization decision. The comparative experiment on two datasets shows that the accuracy of our approach outperforms the k-nearest-neighbor (kNN) based classifier, as well as outstanding efficiency compare with the Support Vector Machine (SVM) based classifier.