Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Document clustering by concept factorization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Relative risk and odds ratio: a data mining perspective
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Criterion functions for document clustering
Criterion functions for document clustering
Statistical Comparisons of Classifiers over Multiple Data Sets
The Journal of Machine Learning Research
Adaptive dimension reduction using discriminant analysis and K-means clustering
Proceedings of the 24th international conference on Machine learning
Mining statistically important equivalence classes and delta-discriminative emerging patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning semantic relatedness from term discrimination information
Expert Systems with Applications: An International Journal
A Robust Discriminative Term Weighting Based Linear Discriminant Method for Text Classification
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
A comparison of extrinsic clustering evaluation metrics based on formal constraints
Information Retrieval
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Non-classical lexical semantic relations
CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
A comparative study of ontology based term similarity measures on PubMed document clustering
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Locally Consistent Concept Factorization for Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Comparing dimension reduction techniques for document clustering
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Hi-index | 0.00 |
Text document clustering is a popular task for understanding and summarizing large document collections. Besides the need for efficiency, document clustering methods should produce clusters that are readily understandable as collections of documents relating to particular contexts or topics. Existing clustering methods often ignore term-document semantics while relying upon geometric similarity measures. In this paper, we present an efficient iterative partitional clustering method, CDIM, that maximizes the sum of discrimination information provided by documents. The discrimination information of a document is computed from the discrimination information provided by the terms in it, and term discrimination information is estimated from the currently labeled document collection. A key advantage of CDIM is that its clusters are describable by their highly discriminating terms --- terms with high semantic relatedness to their clusters' contexts. We evaluate CDIM both qualitatively and quantitatively on ten text data sets. In clustering quality evaluation, we find that CDIM produces high-quality clusters superior to those generated by the best methods. We also demonstrate the understandability provided by CDIM, suggesting its suitability for practical document clustering.