A study of information retrieval on accumulative social descriptions using the generation features

  • Authors:
  • Lichun Yang;Shengliang Xu;Shenghua Bao;Dingyi Han;Zhong Su;Yong Yu

  • Affiliations:
  • Shanghai Jiao Tong University, Shanghai, China;Shanghai Jiao Tong University, Shanghai, China;IBM China Research Lab, Beijing, China;Shanghai Jiao Tong University, Shanghai, China;IBM China Research Lab, Beijing, China;Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is concerned with the study of information retrieval (IR) on Accumulative Social Descriptions (ASDs). ASDs refer to Web texts that accumulated by many Web users describing certain Web resources, such as anchor texts, search logs and social annotations. There have been some studies working on leveraging ASDs for improving search performance in both internet and intranet. However, to the best of our knowledge, no prior study has concerned the specific generation features of ASDs, which are the focus point of this paper. Specifically, we consider the generation features from two perspectives, the generation processes and the generated distributions. Further, three probabilistic IR models are derived based on them. The three models are first demonstrated with one toy dataset and then empirically evaluated with two real datasets: an internet dataset consisting of 90,295 Web pages, with 25,845,818 social annotations crawled from Del.icio.us and 31,320,005 pieces of anchor texts crawled through Yahoo! API, and an intranet dataset consisting of 179,835 Web pages with 1,245,522 annotations dumped from the intranet tagging system in IBM, named as Dogear. Extensive experimental results show that the proposed methods, which fully leverage the generation features of ASDs, improve the performance of both internet and intranet search significantly.