Combinatorial optimization: algorithms and complexity
Combinatorial optimization: algorithms and complexity
Agglomerative clustering of a search engine query log
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine
Proceedings of the 10th international conference on World Wide Web
Query clustering using user logs
ACM Transactions on Information Systems (TOIS)
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Query Expansion by Mining User Logs
IEEE Transactions on Knowledge and Data Engineering
SIAM Journal on Discrete Mathematics
Query chains: learning to rank from implicit feedback
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Improving web search ranking by incorporating user behavior information
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search
ACM Transactions on Information Systems (TOIS)
A new rank correlation coefficient for information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bounded Index for Cluster Validity
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Contextual Ranking of Keywords Using Click Data
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hi-index | 0.00 |
Clustering of search engine queries has attracted significant attention in recent years. Many search engine applications such as query recommendation require query clustering as a pre-requisite to function properly. Indeed, clustering is necessary to unlock the true value of query logs. However, clustering search queries effectively is quite challenging, due to the high diversity and arbitrary input by users. Search queries are usually short and ambiguous in terms of user requirements. Many different queries may refer to a single concept, while a single query may cover many concepts. Existing prevalent clustering methods, such as K-Means or DBSCAN cannot assure good results in such a diverse environment. Agglomerative clustering gives good results but is computationally quite expensive. This paper presents a novel clustering approach based on a key insight--search engine results might themselves be used to identify query similarity. We propose a novel similarity metric for diverse queries based on the ranked URL results returned by a search engine for queries. This is used to develop a very efficient and accurate algorithm for clustering queries. Our experimental results demonstrate more accurate clustering performance, better scalability and robustness of our approach against known baselines.