A statistical approach to machine translation
Computational Linguistics
On Clustering Using Random Walks
FST TCS '01 Proceedings of the 21st Conference on Foundations of Software Technology and Theoretical Computer Science
Some Notes on Alternating Optimization
AFSS '02 Proceedings of the 2002 AFSS International Conference on Fuzzy Systems. Calcutta: Advances in Soft Computing
Cluster ensembles: a knowledge reuse framework for combining partitionings
Eighteenth national conference on Artificial intelligence
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A program for aligning sentences in bilingual corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Multiple kernel learning, conic duality, and the SMO algorithm
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Spectral clustering and transductive learning with multiple views
Proceedings of the 24th international conference on Machine learning
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
WordNet::Similarity: measuring the relatedness of concepts
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
The ACL Anthology Network corpus
NLPIR4DL '09 Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries
Edge Weight Regularization over Multiple Graphs for Similarity Learning
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.00 |
A key problem in document classification and clustering is learning the similarity between documents. Traditional approaches include estimating similarity between feature vectors of documents where the vectors are computed using TF-IDF in the bag-of-words model. However, these approaches do not work well when either similar documents do not use the same vocabulary or the feature vectors are not estimated correctly. In this paper, we represent documents and keywords using multiple layers of connected graphs. We pose the problem of simultaneously learning similarity between documents and keyword weights as an edge-weight regularization problem over the different layers of graphs. Unlike most feature weight learning algorithms, we propose an unsupervised algorithm in the proposed framework to simultaneously optimize similarity and the keyword weights. We extrinsically evaluate the performance of the proposed similarity measure on two different tasks, clustering and classification. The proposed similarity measure outperforms the similarity measure proposed by (Muthukrishnan et al., 2010), a state-of-the-art classification algorithm (Zhou and Burges, 2007) and three different baselines on a variety of standard, large data sets.