The influence of personalization on tag query length in social media search

  • Authors:
  • M. Clements;A. P. de Vries;M. J. T. Reinders

  • Affiliations:
  • ICT Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands;ICT Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands and Centre for Mathematics and Computer ...;ICT Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Social content systems contain enormous collections of unstructured user-generated content, annotated by the collaborative effort of regular Internet users. Tag-clouds have become popular interfaces that allow users to query the database of these systems by clicking relevant terms. However, these single click queries are often not expressive enough to effectively retrieve the desired content. Users have to use multiple clicks or type longer queries to satisfy their information need. To enhance the predicted content ranking we use a random walk model that effectively integrates the user's preference and semantically related query terms. We use the collaborative annotations from a popular on-line book catalog to create a social annotation graph and study the effect of personalization and smoothing for increasing query lengths. We show that personalization and smoothing allow the user to find equally relevant content with fewer query terms compared to a frequency based content ranking with TF-IDF weighing. As expected, we see that the influence of the random walk model disappears if users type more detailed queries. Finally, we discuss the observations with respect to synonyms and homographs which are well known to hamper the performance of information retrieval systems.