Word association norms, mutual information, and lexicography
Computational Linguistics
Class-based n-gram models of natural language
Computational Linguistics
The nature of statistical learning theory
The nature of statistical learning theory
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Machine Learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features
ECML '98 Proceedings of the 10th European Conference on Machine Learning
Statistical Models for Co-occurrence Data
Statistical Models for Co-occurrence Data
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Word clustering and disambiguation based on co-occurrence data
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Cluster based symbolic representation and feature selection for text classification
ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Dissimilarity based feature selection for text classification: a cluster based approach
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Text categorization based on subtopic clusters
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Hi-index | 0.00 |
We propose a new method to improve the accuracy of Text Categorization using two-dimensional clustering. In a number of previous probabilistic approaches, texts in the same category are implicitly assumed to be generated from an identical distribution. We empirically show that this assumption is not accurate, and propose a new framework based on two-dimensional clustering to alleviate this problem. In our method, training texts are clustered so that the assumption is more likely to be true, and at the same time, features are also clustered in order to tackle the data sparseness problem. We conduct some experiments to validate the proposed two-dimensional clustering method.