Matrix analysis
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonlinear programming: a historical view
ACM SIGMAP Bulletin
Spectral clustering and transductive learning with multiple views
Proceedings of the 24th international conference on Machine learning
A tutorial on spectral clustering
Statistics and Computing
Constrained Clustering: Advances in Algorithms, Theory, and Applications
Constrained Clustering: Advances in Algorithms, Theory, and Applications
Identifying and generating easy sets of constraints for clustering
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
NRC's PORTAGE system for WMT 2007
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
A co-classification approach to learning from multilingual corpora
Machine Learning
Multi-view clustering of multilingual documents
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Flexible constrained spectral clustering
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Measuring constraint-set utility for partitional clustering algorithms
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Identifying multilingual Wikipedia articles based on cross language similarity and activity
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
With the development of statistical machine translation, we have ready-to-use tools that can translate documents from one language to many other languages. These translations provide different yet correlated views of the same set of documents. This gives rise to an intriguing question: can we use the extra information to achieve a better clustering of the documents? Some recent work on multiview clustering provided positive answers to this question. In this work, we propose an alternative approach to address this problem using the constrained clustering framework. Unlike traditional Must-Link and Cannot-Link constraints, the constraints generated from machine translation are dense yet noisy. We show how to incorporate this type of constraints by presenting two algorithms, one parametric and one non-parametric. Our algorithms are easy to implement, efficient, and can consistently improve the clustering of real data, namely the Reuters RCV1/RCV2 Multilingual Dataset. In contrast to existing multiview clustering algorithms, our technique does not need the compatibility or the conditional independence assumption, nor does it involve subtle parameter tuning.