Identification of Similar Documents Using Coherent Chunks

  • Authors:
  • Sobha Lalitha Devi;Sankar Kuppan;Kavitha Venkataswamy;Pattabhi R. Rao

  • Affiliations:
  • AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India;AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India;AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India;AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India

  • Venue:
  • DAARC '09 Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium on Anaphora Processing and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We focus on automatically finding similar documents using coherent chunks. The similarity between the documents is determined by identifying the coherent chunks present in them. We apply linguistic rules in identifying the coherent chunks and uses Vector Space Model (VSM) in determining the similarity among documents. We have taken patent documents from USPTO for this work. This method of using coherent chunks for identifying similar documents has shown encouraging results.