NTLM: a time-enhanced language model based ranking approach for web search

  • Authors:
  • Xiaowen Li;Peiquan Jin;Xujian Zhao;Hong Chen;Lihua Yue

  • Affiliations:
  • School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei, China;School of Computer Science and Technology, University of Science and Technology of China, Hefei, China

  • Venue:
  • WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Time plays important roles in Web search, because most Web pages contain time information and a lot of Web queries are time-related. However, traditional search engines have little consideration on the time information in Web pages. In particular, they do not take into account the time information of Web pages when ranking search results. In this paper, we present NTLM, a new time-enhanced language model based ranking algorithm for Web search. First, we present an effective algorithm to extract 〈keyword, content time〉 pairs for Web pages, which associate each keyword in a Web page with an appropriate content time. Then we introduce the new concept of temporal tf, the time-constrained term frequency, for each keyword. After that, we propose a time-enhanced language model to measure the similarity between temporal-textual queries and Web pages on the basis of the combination of textual relevance and temporal relevance. We conduct comparison experiments between NTLM and five competitor algorithms and use two datasets, different types of queries, and two metrics as MRR and NDCG to evaluate the performance. The experimental results show that in the step of extracting 〈keyword, content time〉 pairs, NTLM reaches a high precision of 93.2%, and in the ranking step, NTLM wins the best with respect to MRR and NDCG.