Knowledge transformation from word space to document space

Authors:
Tao Li;Chris Ding;Yi Zhang;Bo Shao
Affiliations:
Florida International University, Miami, FL, USA;University of Texas at Arlington, Arlington, TX, USA;Florida International University, Miami, FL, USA;Florida International University, Miami, FL, USA
Venue:
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2008

Citing 18
Cited 9

Algorithms for clustering data

Algorithms for clustering data
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
Clustering Algorithms

Clustering Algorithms
Modern Information Retrieval

Modern Information Retrieval
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A general model for clustering binary data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised learning on k-partite graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Heterogeneous source consensus learning via decision propagation and negotiation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Transductive Classification via Dual Regularization

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
A non-negative matrix tri-factorization approach to sentiment classification with lexical prior knowledge

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Supervised Dual-PLSA for Personalized SMS Filtering

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
HCC: a hierarchical co-clustering algorithm

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Topic aspect analysis for multi-document summarization

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Mining the “Voice of the Customer” for Business Prioritization

ACM Transactions on Intelligent Systems and Technology (TIST)
Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Triplex transfer learning: exploiting both shared and distinct concepts for text classification

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many real-world applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the document (concept) side. In this paper, we provide a mechanism for this knowledge transformation. To the best of our knowledge, this is the first model for such type of knowledge transformation. This model uses a nonnegative matrix factorization model X = FSGT, where X is the word document semantic matrix, F is the posterior probability of a word belonging to a word cluster and represents knowledge in the word space, G is the posterior probability of a document belonging to a document cluster and represents knowledge in the document space, and S is a scaled matrix factor which provides a condensed view of X. We show how knowledge on words can improve document clustering, i.e, knowledge in the word space is transformed into the document space. We perform extensive experiments to validate our approach.