Event detection from evolution of click-through data

  • Authors:
  • Qiankun Zhao;Tie-Yan Liu;Sourav S. Bhowmick;Wei-Ying Ma

  • Affiliations:
  • Pennsylvania State University & Nanyang Technological University, Singapore;Pennsylvania State University & Nanyang Technological University, Singapore;Microsoft Research Asia, Beijing, China;Microsoft Research Asia, Beijing, China

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Previous efforts on event detection from the web have focused primarily on web content and structure data ignoring the rich collection of web log data. In this paper, we propose the first approach to detect events from the click-through data, which is the log data of web search engines. The intuition behind event detection from click-through data is that such data is often event-driven and each event can be represented as a set ofquery-page pairs that are not only semantically similar but also have similar evolution pattern over time. Given the click-through data, in our proposed approach, we first segment it into a sequence of bipartite graphs based on theuser-defined time granularity. Next, the sequence of bipartite graphs is represented as a vector-based graph, which records the semantic and evolutionary relationships between queries and pages. After that, the vector-based graph is transformed into its dual graph, where each node is a query-page pair that will be used to represent real world events. Then, the problem of event detection is equivalent to the problem of clustering the dual graph of the vector-based graph. The clustering process is based on a two-phase graph cut algorithm. In the first phase, query-page pairs are clustered based on thesemantic-based similarity such that each cluster in the result corresponds to a specific topic. In the second phase, query-page pairs related to the same topic are further clustered based on the evolution pattern-based similarity such that each cluster is expected to represent a specific event under the specific topic. Experiments with real click-through data collected from a commercial web search engine show that the proposed approach produces high quality results.