Methods of Relevance Ranking and Hit-content Generation in Math Search

  • Authors:
  • Abdou S. Youssef

  • Affiliations:
  • Department of Computer Science, The George Washington University, Washington DC, 20052, USA

  • Venue:
  • Calculemus '07 / MKM '07 Proceedings of the 14th symposium on Towards Mechanized Mathematical Assistants: 6th International Conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

To be effective and useful, math search systems must not only maximize precision and recall, but also present the query hits in a form that makes it easy for the user to identify quickly the truly relevant hits. To meet that requirement, the search system must sort the hits according to domain-appropriate relevance criteria, and provide with each hit a query-relevant summary of the hit target.The standard relevance measures in text search, which rely mostly on keyword frequencies and document sizes, turned out to be inadequate in math search. Therefore, alternative relevance measures must be defined, which give more weight to certain types of information than to others and take into account cross-reference statistics. In this paper, new, multidimensional relevance metrics are defined for math search, methods for computing and implementing them are discussed, and comparative performance evaluation results are presented.Query-relevant hit-summary generation is another factor that enables users to quickly determine the relevance of the presented hits. Although the hit title accompanied by a few leading sentences from the target document is simple to produce, this often fails to convey to the user the document's relevant excerpts. This shifts the burden onto the user to pursue many of the hits, and read significant portions of their target documents, to finally locate the wanted documents. Clearly, this task is too time-consuming and should be largely automated. This paper presents query-relevant hit-summary generation methods, outlines implementation strategies, and presents performance evaluation results.