Combining and selecting characteristics of information use

  • Authors:
  • Ian Ruthven;Mounia Lalmas;Keith van Rijsbergen

  • Affiliations:
  • Univ. of Strathclyde, Glasgow;Univ. of London, London;Univ. of Glasgow, Glasgow

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ruthven, Lalmas, and van Rijsbergen use traditional term importance measures like inverse document frequency, noise, based upon in-document frequency, and term frequency supplemented bytheme value which is calculated from differences of expected positions of words in a text from their actual positions, on the assumption that even distribution indicates term association with a main topic, andcontext,which is based on a query term's distance from the nearest other query term relative to the average expected distribution of all query terms in the document. They then define document characteristics likespecificity, the sum of all idf values in a document over the total terms in the document, or document complexity, measured by the documents averageidf value; and information to noise ratio, info-noise, tokens after stopping and stemming over tokens before these processes, measuring the ratio of useful and non-useful information in a document. Retrieval tests are then carried out using each characteristic, combinations of the characteristics, and relevance feedback to determine the correct combination of characteristics. A file ranks independently of query terms by both specificity and info-noise, but if presence of a query term is required unique rankings are generated.