A multi-class SVM classification system based on learning methods from indistinguishable chinese official documents

Authors:
JuiHsi Fu;SingLing Lee
Affiliations:
Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Road, Minhsiung Township, 62162 Chiayi, Taiwan, ROC;Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Road, Minhsiung Township, 62162 Chiayi, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2012

Citing 24
Cited 2

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Support-Vector Networks

Machine Learning
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Induction of Decision Trees

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
An Improved Cluster Labeling Method for Support Vector Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)

Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing)
Improving performance of text categorization by combining filtering and support vector machines: Research Articles

Journal of the American Society for Information Science and Technology
Introducing a Family of Linear Measures for Feature Selection in Text Categorization

IEEE Transactions on Knowledge and Data Engineering
Fast Kernel Classifiers with Online and Active Learning

The Journal of Machine Learning Research
Support Vector Machines for Text Categorization in Chinese Question Classification

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Document Classification Based on Support Vector Machine Using a Concept Vector Model

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Learning on the border: active learning in imbalanced data classification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research
Classification of skewed and homogenous document corpora with class-based and corpus-based keywords

KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
Data Mining with Computational Intelligence

Data Mining with Computational Intelligence
A support vector machines classifier to assess the severity of idiopathic scoliosis from surface topography

IEEE Transactions on Information Technology in Biomedicine
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Language independent semantic kernels for short-text classification

Expert Systems with Applications: An International Journal
An information theoretic sparse kernel algorithm for online learning

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	12.05

Visualization

Abstract

Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.