A hybrid two-stage approach for discipline-independent canonical representation extraction from references

  • Authors:
  • Sung Hee Park;Roger W. Ehrich;Edward A. Fox

  • Affiliations:
  • Digital Library Research Laboratory, Virginia Tech, Blacksburg, VA, USA;Center for Human Computer Interaction, Virginia Tech, Blacksburg, VA, USA;Digital Library Research Laboratory, Virginia Tech, Blacksburg, VA, USA

  • Venue:
  • Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In education and research, references play a key role. However, extracting and parsing references are difficult problems. One concern is that there are many styles of references; hence, given a surface form, identifying what style was employed is problematic, especially in heterogeneous collections of theses and dissertations, which cover many fields and disciplines, and where different styles may be used even in the same publication. We address these problems by drawing upon suitable knowledge found in the WWW. In particular, we research a two-stage classifier approach, involving multi-class classification with respect to reference styles, and partially solve the problem of parsing surface representations of references. We describe empirical evidence for the effectiveness of our approach and plans for improvement of our methods.