Hierarchical indexing and document matching in BoW

  • Authors:
  • Maayan Geffet;Dror G. Feitelson

  • Affiliations:
  • School of Computer Science and Engineering, The Hebrew University, 91904 Jerusalem, Israel;School of Computer Science and Engineering, The Hebrew University, 91904 Jerusalem, Israel

  • Venue:
  • Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

BoW is an on-line bibliographical repository based on a hierarchical c oncept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specific features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassifications are indeed ambiguous to begin with.