Improved search for socially annotated data

Authors:
Nikos Sarkas;Gautam Das;Nick Koudas
Affiliations:
University of Toronto;University of Texas at Arlington;University of Toronto
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 18
Cited 4

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Foundations of statistical natural language processing

Foundations of statistical natural language processing
A hidden Markov model information retrieval system

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Convex Optimization

Convex Optimization
Usage patterns of collaborative tagging systems

Journal of Information Science
The complex dynamics of collaborative tagging

Proceedings of the 16th international conference on World Wide Web
Optimizing web search using social annotations

Proceedings of the 16th international conference on World Wide Web
P-TAG: large scale automatic generation of personalized annotation tags for the web

Proceedings of the 16th international conference on World Wide Web
Towards effective browsing of large scale social annotations

Proceedings of the 16th international conference on World Wide Web
Can social bookmarking improve web search?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
On convergence properties of the em algorithm for gaussian mixtures

Neural Computation
Introduction to Information Retrieval

Introduction to Information Retrieval
Trend detection in folksonomies

SAMT'06 Proceedings of the First international conference on Semantic and Digital Media Technologies
Information retrieval in folksonomies: search and ranking

ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications

Modeling user expertise in folksonomies by fusing multi-type features

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Bursty event detection from collaborative tags

World Wide Web
Automatic tagging by exploring tag information capability and correlation

World Wide Web
Partitioning and ranking tagged data sources

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social annotation is an intuitive, on-line, collaborative process through which each element of a collection of resources (e.g., URLs, pictures, videos, etc.) is associated with a group of descriptive keywords, widely known as tags. Each such group is a concise and accurate summary of the relevant resource's content and is obtained via aggregating the opinion of individual users, as expressed in the form of short tag sequences. The availability of this information gives rise to a new searching paradigm where resources are retrieved and ranked based on the similarity of a keyword query to their accompanying tags. In this paper, we present a principled and efficient search and resource ranking methodology that utilizes exclusively the user-assigned tag sequences. Ranking is based on solid probabilistic foundations and our growing understanding of the dynamics and structure of the social annotation process, which we capture by employing powerful interpolated n-gram models on the tag sequences. The efficiency and applicability of the proposed solution to large data sets is guaranteed through the introduction of a novel and highly scalable constrained optimization framework, employed both for training and incrementally maintaining the n-gram models. We experimentally validate the efficiency and effectiveness of our solutions compared to other applicable approaches. Our evaluation is based on a large crawl of del.icio.us, numbering hundreds of thousands of users and millions of resources, thus demonstrating the applicability of our solutions to real-life, large scale systems. In particular, we demonstrate that the use of interpolated n-grams for modeling tag sequences results in superior ranking effectiveness, while the proposed optimization framework is superior in terms of performance both for obtaining ranking parameters and incrementally maintaining them.