Statistical semantics: analysis of the potential performance of keyword information systems
Human factors in computer systems
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Temporal analysis of a very large topically categorized Web query log
Journal of the American Society for Information Science and Technology
Optimal Segmentation Using Tree Models
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Exploring distributional similarity based models for query spelling correction
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Optimizing web search using social annotations
Proceedings of the 16th international conference on World Wide Web
A large-scale evaluation and analysis of personalized search strategies
Proceedings of the 16th international conference on World Wide Web
Can social bookmarking enhance search in the web?
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Can social bookmarking improve web search?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Query reformulation using anchor text
Proceedings of the third ACM international conference on Web search and data mining
Conversational tagging in twitter
Proceedings of the 21st ACM conference on Hypertext and hypermedia
A structured approach to query recommendation with social annotation data
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Demand-driven tag recommendation
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Tags in domain-specific sites: new information?
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Social annotation in query expansion: a machine learning approach
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Exploring categorization property of social annotations for information retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Data Mining and Knowledge Discovery
Comparing tweets and tags for URLs
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A social inverted index for social-tagging-based information retrieval
Journal of Information Science
Document Re-ranking Using Partial Social Tagging
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Building user profiles from topic models for personalised search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Social semantic query expansion
ACM Transactions on Intelligent Systems and Technology (TIST) - Survey papers, special sections on the semantic adaptive social web, intelligent systems for health informatics, regular papers
Hi-index | 0.00 |
We investigate tag and query logs to see if the terms people use to annotate websites are similar to the ones they use to query for them. Over a set of URLs, we compare the distribution of tags used to annotate each URL with the distribution of query terms for clicks on the same URL. Understanding the relationship between the distributions is important to determine how useful tag data may be for improving search results and conversely, query data for improving tag prediction. In our study, we compare both term frequency distributions using vocabulary overlap and relative entropy. We also test statistically whether the term counts come from the same underlying distribution. Our results indicate that the vocabulary used for tagging and searching for content are similar but not identical. We further investigate the content of the websites to see which of the two distributions (tag or query) is most similar to the content of the annotated/searched URL. Finally, we analyze the similarity for different categories of URLs in our sample to see if the similarity between distributions is dependent on the topic of the website or the popularity of the URL.