Combining bi-gram of character and word to classify two-class chinese texts in two steps

Authors:
Xinghua Fan;Difei Wan;Guoying Wang
Affiliations:
College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, P.R. China;College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, P.R. China;College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, P.R. China
Venue:
RSCTC'06 Proceedings of the 5th international conference on Rough Sets and Current Trends in Computing
Year:
2006

Citing 11
Cited 0

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Combining automatic and manual index representations in probabilistic retrieval

Journal of the American Society for Information Science
Method combination for document filtering

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A meta-learning approach for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Probabilistic combination of text classifiers using reliability indicators: models and results

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Combining Multiple Learning Strategies for Effective Cross Validation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Classifying chinese texts in two steps

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a two-step method of combining two types of features for two-class Chinese text categorization. First, the bi-gram of character is regarded as candidate feature, a Naive Bayesian classifier is used to classify texts. Then, the fuzzy area between two categories is fixed directly according to the outputs of the classifier. Second, the bi-gram of word with parts of speech verb or noun is regarded as candidate feature, a Naive Bayesian classifier same as that in the first step is used to deal with the documents falling into the fuzzy area, which are thought of classifying unreliable in the previous step. Our experiment validated the soundness of the proposed method, which achieved a high performance, with the precision, recall and F1 being 97.65%, 97.00% and 97.31% respectively on a test set consisting of 12,600 Chinese texts.