Text categorization based on subtopic clusters

Authors:
Francis C. Y. Chik;Robert W. P. Luk;Korris F. L. Chung
Affiliations:
Department of Computing, Hong Kong Polytechnic University;Department of Computing, Hong Kong Polytechnic University;Department of Computing, Hong Kong Polytechnic University
Venue:
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Year:
2005

Citing 11
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
An investigation of linguistic features and clustering algorithms for topical document clustering

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
A vector space model for automatic indexing

Communications of the ACM
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Two-dimensional clustering for text categorization

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Document clustering of scientific texts using citation contexts

Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.