Tree Indexing for Efficient Search of Similar Documents

  • Authors:
  • Chung-Min Chen;Duen-Ren Liu

  • Affiliations:
  • -;-

  • Venue:
  • COMPSAC '00 24th International Computer Software and Applications Conference
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Linear algebra-based techniques have long been used to correlate similar documents. They map the documents to a multi-dimensional vector space, in which a vector represents each document. Searching related documents then translates into searching nearest neighbors in the vector space. In this paper, we propose an indexing structure, called cosine R-tree, which indexes multidimensional vector space and provides efficient nearest neighbor search. Our preliminary results show that it gives better performance than a brute-force linear scan strategy.