Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Machine Learning
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
In Defense of One-Vs-All Classification
The Journal of Machine Learning Research
An Improved Cluster Labeling Method for Support Vector Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
Journal of the American Society for Information Science and Technology
Introducing a Family of Linear Measures for Feature Selection in Text Categorization
IEEE Transactions on Knowledge and Data Engineering
Fast Kernel Classifiers with Online and Active Learning
The Journal of Machine Learning Research
Support Vector Machines for Text Categorization in Chinese Question Classification
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Document Classification Based on Support Vector Machine Using a Concept Vector Model
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Learning on the border: active learning in imbalanced data classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Solving multiclass learning problems via error-correcting output codes
Journal of Artificial Intelligence Research
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords
KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
Data Mining with Computational Intelligence
Data Mining with Computational Intelligence
IEEE Transactions on Information Technology in Biomedicine
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Language independent semantic kernels for short-text classification
Expert Systems with Applications: An International Journal
An information theoretic sparse kernel algorithm for online learning
Expert Systems with Applications: An International Journal
Hi-index | 12.05 |
Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.