Evaluating document-to-document relevance based on document language model: modeling, implementation and performance evaluation

  • Authors:
  • Ge Yu;Xiaoguang Li;Yubin Bao;Daling Wang

  • Affiliations:
  • School of Information Science and Engineering, Northeastern University, Shenyang, P.R. China;School of Information Science and Engineering, Northeastern University, Shenyang, P.R. China;School of Information Science and Engineering, Northeastern University, Shenyang, P.R. China;School of Information Science and Engineering, Northeastern University, Shenyang, P.R. China

  • Venue:
  • CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

To evaluate document-to-document relevance is very important to many advanced applications such as IR, text mining and natural language processing. Since it is very hard to define document relevance in a mathematic way on account of users' uncertainty, the concept of topical relevance is widely accepted by most of research fields. It suggests that a document relevance model should explain whether the document representation describes its topical contents and the matching method reveals the topical differences among the documents. However, the current document-to-document relevance models, such as vector space model, string distance, don't put explicitly emphasis on the perspective of topical relevance. This paper exploits a document language model to represent the document topical content and explains why it can reveal the document topics and then establishes two distributional similarity measure based on the document language model to evaluate document-to-document relevance. The experiment on the TREC testing collection is made to compare it with the vector space model, and the results show that the Kullback-Leibler divergence measure with Jelinek-Mercer smoothing outperforms the vector space model significantly.