Dynamic agglomerative-divisive clustering of clickthrough data for collaborative web search

Authors:
Kenneth Wai-Ting Leung;Dik Lun Lee
Affiliations:
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong;Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong
Venue:
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Year:
2010

Citing 8
Cited 0

Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
CubeSVD: a novel approach to personalized Web search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Latent semantic analysis for multiple-type interrelated data objects

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Mining User preference using Spy voting for search engine personalization

ACM Transactions on Internet Technology (TOIT)
Hierarchical Clustering of Time-Series Data Streams

IEEE Transactions on Knowledge and Data Engineering
Personalized Concept-Based Clustering of Search Engine Queries

IEEE Transactions on Knowledge and Data Engineering
Semi-fuzzy splitting in online divisive-agglomerative clustering

EPIA'07 Proceedings of the aritficial intelligence 13th Portuguese conference on Progress in artificial intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we model clickthroughs as a tripartite graph involving users, queries and concepts embodied in the clicked pages. We develop the Dynamic Agglomerative-Divisive Clustering (DADC) algorithm for clustering the tripartite clickthrough graph to identify groups of similar users, queries and concepts to support collaborative web search. Since the clickthrough graph is updated frequently, DADC clusters the graph incrementally, whereas most of the traditional agglomerative methods cluster the whole graph all over again. Moreover, clickthroughs are usually noisy and reflect diverse interests of the users. Thus, traditional agglomerative clustering methods tend to generate large clusters when the clickthrough graph is large. DADC avoids generating large clusters using two interleaving phases: the agglomerative and divisive phases. The agglomerative phase iteratively merges similar clusters together to avoid generating sparse clusters. On the other hand, the divisive phase iteratively splits large clusters into smaller clusters to maintain the coherence of the clusters and restructures the existing clusters to allow DADC to incrementally update the affected clusters as new clickthrough data arrives.