Solving multi-label text categorization problem using support vector machine approach with membership function

Authors:
Tai-Yue Wang;Huei-Min Chiang
Affiliations:
Department of Industrial and Information Management, National Cheng Kung University, 1 Ta-Shueh Road, Tainan City 70101, Taiwan, ROC;Department of Information Management, Nan Jeon Institute of Technology, No.178, Chaoqin Rd., Yanshui District, Tainan City 73746, Taiwan, ROC
Venue:
Neurocomputing
Year:
2011

Citing 13
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
A comparison of collocation-based similarity measures in query expansion

Information Processing and Management: an International Journal
Correlation of fuzzy sets

Fuzzy Sets and Systems
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Information Retrieval

Information Retrieval
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Construction of weak and strong similarity measures for ordered sets of documents using fuzzy set techniques

Information Processing and Management: an International Journal
Fuzzy least squares support vector machines for multiclass problems

Neural Networks - 2003 Special issue: Advances in neural networks research — IJCNN'03
Retrieval Method for Multi-category Images

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02
Automatic Information Organization and Retrieval.

Automatic Information Organization and Retrieval.
Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization

IEEE Transactions on Knowledge and Data Engineering
s-grams: Defining generalized n-grams for information retrieval

Information Processing and Management: an International Journal
Text document clustering based on neighbors

Data & Knowledge Engineering

Recognition of word collocation habits using frequency rank ratio and inter-term intimacy

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

The pervasiveness of information available on the internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. In this study, a proposed multi-label text categorization system is developed to classify multi-label documents. Data mapping is performed to transform data from a high-dimensional space to a lower-dimensional space with paired SVM output values, thus lowering the complexity of the computation. A pairwise comparison approach is applied to set the membership function in each predicted class to judge all possible classified classes. To better explain the proposed model, a comparative study using Reuter's data sets is performed on several multi-label approaches such as Naive Bayes, Multi-Label Mixture, Jaccard Kernel and Bp-MLL. Though the comparative results of the empirical experiment indicate that the proposed multi-label text categorization system performs better than other methods in terms of overall performance indices, these comparisons are done under the conditions without knowing original settings of parameters. From these comparative studies, it is found that these probabilities of documents appearing in correctly predicted classes and those of documents appearing in the wrongly predicted classes are important properties and we conclude that the probability of 0.5 for model membership function is a good criterion to judge between correctly and incorrectly classified documents from the results of the empirical experiment.