Multidimensional mining of large-scale search logs: a topic-concept cube approach

Authors:
Dongyeop Kang;Daxin Jiang;Jian Pei;Zhen Liao;Xiaohui Sun;Ho-Jin Choi
Affiliations:
Korea Advanced Institute of Science and Technology, Yuseong-gu, South Korea;Microsoft Research Asia, Beijing, China;Simon Fraser University, Burnaby, Canada;Nankai University, Tianjin, China;Microsoft Research Asia, Beijing, China;Korea Advanced Institute of Science and Technology, Yuseong-gu, South Korea
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 14
Cited 4

Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Hourly analysis of a very large topically categorized web query log

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Q2C@UST: our winning solution to query classification in KDDCUP 2005

ACM SIGKDD Explorations Newsletter
Event detection from evolution of click-through data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Spatial variation in search engine queries

Proceedings of the 17th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards context-aware search by learning a very large variable length hidden markov model from search logs

Proceedings of the 18th international conference on World wide web
Computers and iphones and mobile phones, oh my!: a logs-based comparison of search users on different devices

Proceedings of the 18th international conference on World wide web
Query recommendation using query logs in search engines

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology

Predicting the social influence of upcoming contents in large social networks

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
A probabilistic mixture model for mining and analyzing product search log

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Mining search and browse logs for web search: A Survey

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Fast topic discovery from web search streams

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In addition to search queries and the corresponding clickthrough information, search engine logs record multidimensional information about user search activities, such as search time, location, vertical, and search device. Multidimensional mining of search logs can provide novel insights and useful knowledge for both search engine users and developers. In this paper, we describe our topic-concept cube project, which addresses the business need of supporting multidimensional mining of search logs effectively and efficiently. We answer two challenges. First, search queries and click-through data are well recognized sparse, and thus have to be aggregated properly for effective analysis. Second, there is often a gap between the topic hierarchies in multidimensional aggregate analysis and queries in search logs. To address those challenges, we develop a novel topic-concept model that learns a hierarchy of concepts and topics automatically from search logs. Enabled by the topicconcept model, we construct a topic-concept cube that supports online multidimensional mining of search log data. A distinct feature of our approach is that, in addition to the standard dimensions such as time and location, our topic-concept cube has a dimension of topics and concepts, which substantially facilitates the analysis of log data. To handle a huge amount of log data, we develop distributed algorithms for learning model parameters efficiently. We also devise approaches to computing a topic-concept cube. We report an empirical study verifying the effectiveness and efficiency of our approach on a real data set of 1.96 billion queries and 2.73 billion clicks.