A statistical comparison of tag and query logs

Authors:
Mark J. Carman;Mark Baillie;Robert Gwadera;Fabio Crestani
Affiliations:
University of Lugano, Lugano, Switzerland;University of Strathclyde, Glasgow, United Kingdom;University of Lugano, Lugano, Switzerland;University of Lugano, Lugano, Switzerland
Venue:
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Year:
2009

Citing 11
Cited 13

Statistical semantics: analysis of the potential performance of keyword information systems

Human factors in computer systems
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A picture of search

InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Temporal analysis of a very large topically categorized Web query log

Journal of the American Society for Information Science and Technology
Optimal Segmentation Using Tree Models

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Exploring distributional similarity based models for query spelling correction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Optimizing web search using social annotations

Proceedings of the 16th international conference on World Wide Web
A large-scale evaluation and analysis of personalized search strategies

Proceedings of the 16th international conference on World Wide Web
Can social bookmarking enhance search in the web?

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Can social bookmarking improve web search?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining

Query reformulation using anchor text

Proceedings of the third ACM international conference on Web search and data mining
Conversational tagging in twitter

Proceedings of the 21st ACM conference on Hypertext and hypermedia
A structured approach to query recommendation with social annotation data

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Demand-driven tag recommendation

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Tags in domain-specific sites: new information?

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Social annotation in query expansion: a machine learning approach

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Exploring categorization property of social annotations for information retrieval

Proceedings of the 20th ACM international conference on Information and knowledge management
Web log analysis: a review of a decade of studies about information acquisition, inspection and interpretation of user interaction

Data Mining and Knowledge Discovery
Comparing tweets and tags for URLs

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A social inverted index for social-tagging-based information retrieval

Journal of Information Science
Document Re-ranking Using Partial Social Tagging

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Building user profiles from topic models for personalised search

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Social semantic query expansion

ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.