A new inductive learning method for multilabel text categorization

Authors:
Yu-Chuan Chang;Shyi-Ming Chen;Churn-Jung Liau
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.;Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, R.O.C.;Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.
Venue:
IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Year:
2006

Citing 16
Cited 1

A probabilistic learning approach for document indexing

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions

ACM Transactions on Information Systems (TOIS)
Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
A learner-independent evaluation of the usefulness of statistical phrases for automated text categorization

Text databases & document management
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Introduction to Information Theory and Data Compression

Introduction to Information Theory and Data Compression
Modern Information Retrieval

Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Distributional word clusters vs. words for text categorization

The Journal of Machine Learning Research
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
A Bayesian approach to learning Bayesian networks with local structure

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Multilabel text categorization based on a new linear classifier learning method and a category-sensitive refinement method

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a new inductive learning method for multilabel text categorization. The proposed method uses a mutual information measure to select terms and constructs document descriptor vectors for each category based on these terms. These document descriptor vectors form a document descriptor matrix. It also uses the document descriptor vectors to construct a document-similarity matrix based on the "cosine similarity measure". It then constructs a term-document relevance matrix by applying the inner product of the document descriptor matrix to the document similarity matrix. The proposed method infers the degree of relevance of the selected terms to construct the category descriptor vector of each category. Then, the relevance score between each category and a testing document is calculated by applying the inner product of its category descriptor vector to the document descriptor vector of the testing document. The maximum relevance score L is then chosen. If the relevance score between a category and the testing document divided by L is not less than a predefined threshold value λ between zero and one, then the document is classified into that category. We also compare the classification accuracy of the proposed method with that of the existing learning methods (i.e., Find Similar, Naïve Bayes, Bayes Nets and Decision Trees) in terms of the break-even point of micro-averaging for categorizing the "Reuters-21578 Aptè split" data set. The proposed method gets a higher average accuracy than the existing methods.