Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Nonlinear component analysis as a kernel eigenvalue problem
Neural Computation
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
A Spectral Algorithm for Learning Mixtures of Distributions
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Active + Semi-supervised Learning = Robust Multi-View Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Kernel Canonical Correlation Analysis and Least Squares Support Vector Machines
ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Analysis of anchor text for web search
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Kernel independent component analysis
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Learning a kernel matrix for nonlinear dimensionality reduction
ICML '04 Proceedings of the twenty-first international conference on Machine learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Improved annotation of the blogosphere via autotagging and hierarchical clustering
Proceedings of the 15th international conference on World Wide Web
Statistical entity-topic models
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing web search using social annotations
Proceedings of the 16th international conference on World Wide Web
Two-view feature generation model for semi-supervised learning
Proceedings of the 24th international conference on Machine learning
A tutorial on spectral clustering
Statistics and Computing
Exploring social annotations for information retrieval
Proceedings of the 17th international conference on World Wide Web
Extracting shared subspace for multi-label classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised Laplacian Regularization of Kernel Canonical Correlation Analysis
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Social tags: meaning and suggestions
Proceedings of the 17th ACM conference on Information and knowledge management
ACM Transactions on Knowledge Discovery from Data (TKDD)
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Multi-view clustering via canonical correlation analysis
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Getting the most out of social annotations for web page classification
Proceedings of the 9th ACM symposium on Document engineering
Exploit the tripartite network of social tagging for web clustering
Proceedings of the 18th ACM conference on Information and knowledge management
Heterogeneous transfer learning for image clustering via the social web
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Multi-view regression via canonical correlation analysis
COLT'07 Proceedings of the 20th annual conference on Learning theory
Learning to tag from open vocabulary labels
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
A correlation approach for automatic image annotation
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Slovak Blog Clustering Enhanced by Mining the Web Comments
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Hi-index | 0.00 |
Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the page-text. However, the advent of social-bookmarking websites, such as StumbleUpon and Delicious, has led to a huge amount of user-generated content such as the tag information that is associated with the webpages. In this paper, we present a subspace based feature extraction approach which leverages tag information to complement the page-contents of a webpage to extract highly discriminative features, with the goal of improved clustering performance. In our approach, we consider page-text and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We compare our subspace based approach with a number of baselines that use tag information in various other ways, and show that the subspace based approach leads to improved performance on the webpage clustering task. Although our results here are on the webpage clustering task, the same approach can be used for webpage classification as well. In the end, we also suggest possible future work for leveraging tag information in webpage clustering, especially when tag information is present for not all, but only for a small number of webpages.