Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
The use of MMR, diversity-based reranking for reordering documents and producing summaries
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Collection statistics for fast duplicate document detection
ACM Transactions on Information Systems (TOIS)
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Information Retrieval
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM SIGIR Forum
Beyond independent relevance: methods and evaluation metrics for subtopic retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Eye-tracking analysis of user behavior in WWW search
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating implicit measures to improve web search
ACM Transactions on Information Systems (TOIS)
LA-WEB '05 Proceedings of the Third Latin American Web Congress
Less is more: probabilistic models for retrieving fewer relevant documents
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
Active exploration for learning rankings from clickthrough data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
An experimental comparison of click position-bias models
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Predicting diverse subsets using structural SVMs
Proceedings of the 25th international conference on Machine learning
SpotSigs: robust and efficient near duplicate detection in large web collections
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Proceedings of the Second ACM International Conference on Web Search and Data Mining
An axiomatic approach for result diversification
Proceedings of the 18th international conference on World wide web
Portfolio theory of information retrieval
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Minimally invasive randomization for collecting unbiased preferences from clickthrough logs
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
An Effectiveness Measure for Ambiguous and Underspecified Queries
ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
Expected reciprocal rank for graded relevance
Proceedings of the 18th ACM conference on Information and knowledge management
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Redundancy, diversity and interdependent document relevance
ACM SIGIR Forum
Here or there: preference judgments for relevance
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Adaptive near-duplicate detection via similarity learning
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A user behavior model for average precision and its generalization to graded judgments
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Practical online retrieval evaluation
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Reducing information redundancy in search results
Proceedings of the 28th Annual ACM Symposium on Applied Computing
User intent and assessor disagreement in web search evaluation
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Fidelity, Soundness, and Efficiency of Interleaved Comparison Methods
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.01 |
The web contains many duplicate and near-duplicate documents. Given that user satisfaction is negatively affected by redundant information in search results, a significant amount of research has been devoted to developing duplicate detection algorithms. However, most such algorithms rely solely on document content to detect duplication, ignoring the fact that a primary goal of duplicate detection is to identify documents that contain redundant information with respect to a particular user query. Similarly, although query-dependent result diversification algorithms compute a query-dependent ranking, they tend to do so on the basis of a query-independent content similarity score. In this paper, we bridge the gap between query-dependent redundancy and query-independent duplication by showing how user click behavior following a query provides evidence about the relative novelty of web documents. While most previous work on interpreting user clicks on search results has assumed that they reflect just result relevance, we show that clicks also provide information about duplication between web documents since users consider search results in the context of previously seen documents. Moreover, we find that duplication explains a substantial amount of presentation bias observed in clicking behavior. We identify three distinct types of redundancy that commonly occur on the web and show how click data can be used to detect these different types.