Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
An investigation of linguistic features and clustering algorithms for topical document clustering
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
BoosTexter: A Boosting-based Systemfor Text Categorization
Machine Learning - Special issue on information retrieval
A vector space model for automatic indexing
Communications of the ACM
Information Retrieval
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Two-dimensional clustering for text categorization
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Document clustering of scientific texts using citation contexts
Information Retrieval
Hi-index | 0.00 |
The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.