Algorithms for clustering data
Algorithms for clustering data
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering
Proceedings of the tenth international conference on Information and knowledge management
Clustering Algorithms
Modern Information Retrieval
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
K-means clustering via principal component analysis
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A general model for clustering binary data
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Orthogonal nonnegative matrix t-factorizations for clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised learning on k-partite graphs
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Heterogeneous source consensus learning via decision propagation and negotiation
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Transductive Classification via Dual Regularization
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Supervised Dual-PLSA for Personalized SMS Filtering
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
HCC: a hierarchical co-clustering algorithm
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Topic aspect analysis for multi-document summarization
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining the “Voice of the Customer” for Business Prioritization
ACM Transactions on Intelligent Systems and Technology (TIST)
Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Triplex transfer learning: exploiting both shared and distinct concepts for text classification
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many real-world applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the document (concept) side. In this paper, we provide a mechanism for this knowledge transformation. To the best of our knowledge, this is the first model for such type of knowledge transformation. This model uses a nonnegative matrix factorization model X = FSGT, where X is the word document semantic matrix, F is the posterior probability of a word belonging to a word cluster and represents knowledge in the word space, G is the posterior probability of a document belonging to a document cluster and represents knowledge in the document space, and S is a scaled matrix factor which provides a condensed view of X. We show how knowledge on words can improve document clustering, i.e, knowledge in the word space is transformed into the document space. We perform extensive experiments to validate our approach.