Identification of Similar Documents Using Coherent Chunks

Authors:
Sobha Lalitha Devi;Sankar Kuppan;Kavitha Venkataswamy;Pattabhi R. Rao
Affiliations:
AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India;AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India;AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India;AU-KBC Research Centre, MIT Campus of Anna University, Chennai, India
Venue:
DAARC '09 Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium on Anaphora Processing and Applications
Year:
2009

Citing 15
Cited 0

Extended person-machine interface

Artificial Intelligence
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
A translation approach to portable ontology specifications

Knowledge Acquisition - Special issue: Current issues in knowledge modeling
Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The SOMLib Digital Library System

ECDL '99 Proceedings of the Third European Conference on Research and Advanced Technology for Digital Libraries
Generating natural language text in response to questions about database structure

Generating natural language text in response to questions about database structure
A computational model for the analysis of arguments

A computational model for the analysis of arguments
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Providing a unified account of definite noun phrases in discourse

ACL '83 Proceedings of the 21st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Transformation-based learning in the fast lane

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus on automatically finding similar documents using coherent chunks. The similarity between the documents is determined by identifying the coherent chunks present in them. We apply linguistic rules in identifying the coherent chunks and uses Vector Space Model (VSM) in determining the similarity among documents. We have taken patent documents from USPTO for this work. This method of using coherent chunks for identifying similar documents has shown encouraging results.