Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A statistical learning learning model of text classification for support vector machines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Word association norms, mutual information, and lexicography
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Learning query intent from regularized click graphs
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
How does clickthrough data reflect retrieval quality?
Proceedings of the 17th ACM conference on Information and knowledge management
Are click-through data adequate for learning web search rankings?
Proceedings of the 17th ACM conference on Information and knowledge management
A dynamic bayesian network click model for web search ranking
Proceedings of the 18th international conference on World wide web
Vertical selection in the presence of unlabeled verticals
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A feature-free search query classification approach using semantic distance
Expert Systems with Applications: An International Journal
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Hi-index | 0.01 |
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled similar documents. Current state-of-the-art classifiers are supervised and require large amounts of manually labeled data. We hypothesize that unlabeled documents similar to our positive and negative labeled documents tend to be clicked through by the same user queries. Our proposed method leverages this hypothesis and augments our training set by modeling the similarity between documents in a click graph. We experiment with three different web page classifiers and show empirical evidence that our proposed approach outperforms state-of-the-art methods and reduces the amount of human effort to label training data.