Improving web site search using web server logs

  • Authors:
  • Jin Zhou;Chen Ding;Dimitrios Androutsos

  • Affiliations:
  • Ryerson University, Toronto, ON, Canada;Ryerson University, Toronto, ON, Canada;Ryerson University, Toronto, ON, Canada

  • Venue:
  • CASCON '06 Proceedings of the 2006 conference of the Center for Advanced Studies on Collaborative research
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite the success of global search engines, web site search engines are still suffering from poor performance. Since a web site is different from the whole web in link structure, access pattern, and data scale, it is not always successful when the methods which improve the performance of web search are applied to web site search. In this paper, we propose a novel algorithm to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and the relationships of web pages in the session are analyzed based on their similarities. Then, a new web page representation is generated. Anchor text is used to create another representation. They are combined with original text-based representation in web site search. Two kinds of combination methods are investigated and tested: combination of document representations and combination of ranking scores. Our experimental results show that our algorithm can improve the retrieval accuracy for the four retrieval models we tested: Inference Network Model, Okapi Model, Cosine Similarity Model and TFIDF Model. The highest performance increase from web log analysis is from TFIDF model, and overall, inference network model with web log information achieves the best result.