Calculating content recency based on timestamped and non-timestamped sources for supporting page quality estimation

  • Authors:
  • Adam Jatowt;Yukiko Kawai;Katsumi Tanaka

  • Affiliations:
  • Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, Japan and Microsoft IJARC Fellow;Kyoto Sangyo University, Motoyama, Kamigamo, Kita-Ku, Kyoto, Japan;Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, Japan

  • Venue:
  • Proceedings of the 2011 ACM Symposium on Applied Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The web is characterized by low publishing barriers and contains content of varying degrees of quality and credibility. It is often difficult for web searchers to locate high quality content in returned search results. In this paper, we propose evaluating the extent to which search results contain recent information related to user queries. Our approach is based on corroborating search results with query-related information obtained from timestamped and non-timestamped sources. It uses news articles collected from online news archives and also employs a simple search index mining process to find terms representing fresh topics. As another contribution, we show how the proposed approach can be used for estimating the focus time of web pages, that is, the time periods to which the content of pages refers. We demonstrate the proof-of-concept system that evaluates and visualizes in real time the freshness levels and focus time of web search results.