Effective topic distillation with key resource pre-selection

  • Authors:
  • Yiqun Liu;Min Zhang;Shaoping Ma

  • Affiliations:
  • State Key Lab of Intelligent Tech. and Sys., Tsinghua University, Beijing, China;State Key Lab of Intelligent Tech. and Sys., Tsinghua University, Beijing, China;State Key Lab of Intelligent Tech. and Sys., Tsinghua University, Beijing, China

  • Venue:
  • AIRS'04 Proceedings of the 2004 international conference on Asian Information Retrieval Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topic distillation aims at finding key resources which are high-quality pages for certain topics. With analysis in non-content features of key resources, a pre-selection method is introduced in topic distillation research. A decision tree is constructed to locate key resource pages using query-independent non-content features including in-degree, document length, URL-type and two new features we found out involving site's self-link structure analysis. Although the result page set contains only about 20% pages of the whole collection, it covers more than 70% of key resources. Furthermore, information retrieval on this page set makes more than 60% improvement with respect to that on all pages. These results were achieved using TREC 2002 web track topic distillation task for training and TREC 2003 corresponding task for testing. It shows an effective way of getting better performance in topic distillation with a dataset significantly smaller in size.