The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing
Foundations of statistical natural language processing
A hidden Markov model information retrieval system
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval
Proceedings of the eighth international conference on Information and knowledge management
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Convex Optimization
Usage patterns of collaborative tagging systems
Journal of Information Science
The complex dynamics of collaborative tagging
Proceedings of the 16th international conference on World Wide Web
Optimizing web search using social annotations
Proceedings of the 16th international conference on World Wide Web
P-TAG: large scale automatic generation of personalized annotation tags for the web
Proceedings of the 16th international conference on World Wide Web
Towards effective browsing of large scale social annotations
Proceedings of the 16th international conference on World Wide Web
Can social bookmarking improve web search?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
On convergence properties of the em algorithm for gaussian mixtures
Neural Computation
Introduction to Information Retrieval
Introduction to Information Retrieval
Trend detection in folksonomies
SAMT'06 Proceedings of the First international conference on Semantic and Digital Media Technologies
Information retrieval in folksonomies: search and ranking
ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Modeling user expertise in folksonomies by fusing multi-type features
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Bursty event detection from collaborative tags
World Wide Web
Partitioning and ranking tagged data sources
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Social annotation is an intuitive, on-line, collaborative process through which each element of a collection of resources (e.g., URLs, pictures, videos, etc.) is associated with a group of descriptive keywords, widely known as tags. Each such group is a concise and accurate summary of the relevant resource's content and is obtained via aggregating the opinion of individual users, as expressed in the form of short tag sequences. The availability of this information gives rise to a new searching paradigm where resources are retrieved and ranked based on the similarity of a keyword query to their accompanying tags. In this paper, we present a principled and efficient search and resource ranking methodology that utilizes exclusively the user-assigned tag sequences. Ranking is based on solid probabilistic foundations and our growing understanding of the dynamics and structure of the social annotation process, which we capture by employing powerful interpolated n-gram models on the tag sequences. The efficiency and applicability of the proposed solution to large data sets is guaranteed through the introduction of a novel and highly scalable constrained optimization framework, employed both for training and incrementally maintaining the n-gram models. We experimentally validate the efficiency and effectiveness of our solutions compared to other applicable approaches. Our evaluation is based on a large crawl of del.icio.us, numbering hundreds of thousands of users and millions of resources, thus demonstrating the applicability of our solutions to real-life, large scale systems. In particular, we demonstrate that the use of interpolated n-grams for modeling tag sequences results in superior ranking effectiveness, while the proposed optimization framework is superior in terms of performance both for obtaining ranking parameters and incrementally maintaining them.