Using position, fonts and cited references to retrieve scientific documents

  • Authors:
  • Yen-Liang Chen;Li-Chen Cheng;Yun-Ling Cheng

  • Affiliations:
  • Department of Information Management, National CentralUniversity, Taiwan, R.O.C.;Department of Information Management, National CentralUniversity, Taiwan, R.O.C.;Department of Information Management, National CentralUniversity, Taiwan, R.O.C.

  • Venue:
  • Journal of Information Science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

As more and more documents become available on the internet, finding documents that fit users' needs from databases containing millions of documents is becoming increasingly important. Since a scientific document is a structured text, it has some useful features that can be used to improve retrieval performance. In this work, we investigate three such features: fonts, position and cited references. While past research has used these three features individually to improve document searching, no existing research discusses how to integrate these three together to improve retrieval performance. This work first investigates the relationships among them, and then uses these three features to design a novel retrieval method based on the discovered relationships. Extensive experiments have been carried out with real scientific documents to show its effectiveness. Our empirical results show that using the location factor alone achieves the same performance as considering location and font factors simultaneously. We also observed that citation similarity is useful only when the similarity is high. Based on these two clues, we developed a method to combine the content vector and reference vector conditionally, and as a result, this integrated approach does, indeed, improve search performance.