Clique Analysis of Query Log Graphs

Authors:
Alexandre P. Francisco;Ricardo Baeza-Yates;Arlindo L. Oliveira
Affiliations:
INESC-ID/IST, Technical University of Lisbon, Portugal;Yahoo! Research Barcelona, Spain & Santiago, Chile;INESC-ID/IST, Technical University of Lisbon, Portugal
Venue:
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Year:
2008

Citing 10
Cited 4

Agglomerative clustering of a search engine query log

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering user queries of a search engine

Proceedings of the 10th international conference on World Wide Web
Subject categorization of query terms for exploring Web users' search interests

Journal of the American Society for Information Science and Technology
Enriching web taxonomies through subject categorization of query terms from search engine logs

Decision Support Systems - Web retrieval and mining
Towards Automatic Generation of Query Taxonomy: A Hierarchical Query Clustering Approach

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Query taxonomy generation for web search

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Extracting semantic relations from query logs

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs from Search Engine Queries

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Query recommendation using query logs in search engines

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
Applications of web query mining

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Investigating the Semantic Gap through Query Log Analysis

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Mining large query induced graphs towards a hierarchical query folksonomy

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Mining large distributed log data in near real time

SLAML '11 Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques
Mining query log graphs towards a query folksonomy

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a method for the analysis of very large graphs obtained from query logs, using query coverage inspection. The goal is to extract semantic relations between queries and their terms. We take a new approach to successfully and efficiently cluster these large graphs by analyzing clique overlap and a priori induced cliques. The clustering quality is evaluated with an extension of the modularity score. Results obtained with real data show that the identified clusters can be used to infer properties of the queries and interesting semantic relations between them and their terms. The quality of the semantic relations is evaluated both using a tf-idf based score and data from the Open Directory Project. The proposed approach is also able to identify and filter out multitopical URLs, a feature that is interesting in itself.