Feature reinforcement approach to poly-lingual text categorization

Authors:
Chih-Ping Wei;Huihua Shi;Christopher C. Yang
Affiliations:
Institute of Technology Management, National Tsing Hua University, Taiwan, ROC;Department of Information Management, National Sun Yat-sen University, Taiwan, ROC;Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Venue:
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
Year:
2007

Citing 14
Cited 0

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
An example-based mapping method for text categorization and retrieval

ACM Transactions on Information Systems (TOIS)
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A comparison of classifiers and document representations for the routing problem

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Combining classifiers in text categorization

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
Using a generalized instance set for automatic text categorization

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Context-sensitive learning methods for text categorization

ACM Transactions on Information Systems (TOIS)
Combination and boundary detection approaches on Chinese indexing

Journal of the American Society for Information Science - Special topic issue on digital libraries: part 2
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid emergence and proliferation of Internet and the trend of globalization, a tremendous amount of textual documents written in different languages are electronically accessible online. Poly-lingual text categorization (PLTC) refers to the automatic learning of a text categorization model(s) from a set of preclassified training documents written in different languages and the subsequent assignment of unclassified poly-lingual documents to predefined categories on the basis of the induced text categorization model(s). Although PLTC can be approached as multiple independent monolingual text categorization problems, this naïve approach employs only the training documents of the same language to construct a monolingual classifier and fails to utilize the opportunity offered by poly-lingual training documents. In this study, we propose a feature reinforcement approach to PLTC that takes into account the training documents of all languages when constructing a monolingual classifier for a specific language. Using the independent monolingual text categorization (MnTC) technique as performance benchmarks, our empirical evaluation results show that the proposed PLTC technique achieves higher classification accuracy than the benchmark technique does in both English and Chinese corpora.