Macro features based text categorization

Authors:
Dandan Wang;Qingcai Chen;Xiaolong Wang;Buzhou Tang
Affiliations:
MOS-MS Key lab of NLP & Speech, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P.R. China;MOS-MS Key lab of NLP & Speech, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P.R. China;MOS-MS Key lab of NLP & Speech, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P.R. China;MOS-MS Key lab of NLP & Speech, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, P.R. China
Venue:
ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Year:
2011

Citing 10
Cited 0

Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Empirical Study of Feature Selection for Text Categorization based on Term Weightage

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Cluster center initialization algorithm for K-means clustering

Pattern Recognition Letters
A semi-supervised feature clustering algorithm with application to word sense disambiguation

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Using hypothesis margin to boost centroid text classifier

Proceedings of the 2007 ACM symposium on Applied computing
A class-feature-centroid classifier for text categorization

Proceedings of the 18th international conference on World wide web
A framework of feature selection methods for text categorization

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text Categorization (TC) is one of the key techniques in web information processing. A lot of approaches have been proposed to do TC; most of them are based on the text representation using the distributions and relationships of terms, few of them take the document level relationships into account. In this paper, the document level distributions and relationships are used as a novel type features for TC. We called them macro features to differentiate from term based features. Two methods are proposed for macro features extraction. The first one is semi-supervised method based on document clustering technique. The second one constructs the macro feature vector of a text using the centroid of each text category. Experiments conducted on standard corpora Reuters-21578 and 20-newsgroup, show that the proposed methods can bring great performance improvement by simply combining macro features with classical term based features.