Pictures of relevance: a geometric analysis of similarity measures
Journal of the American Society for Information Science
The smart document retrieval project
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Translating collocations for bilingual lexicons: a statistical approach
Computational Linguistics
Information Retrieval
Modern Information Retrieval
An information-theoretic measure for document similarity
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Document Similarity Using a Phrase Indexing Graph Model
Knowledge and Information Systems
Video clip retrieval by maximal matching and optimal matching in graph theory
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Hi-index | 0.00 |
Measuring pairwise document similarity is critical to various text retrieval and mining tasks. The most popular measure for document similarity is the Cosine measure in Vector Space Model. In this paper, we propose a new similarity measure based on optimal matching in graph theory. The proposed measure takes into account the structural information of a document by considering the word distributions over different text segments. It first calculates the similarities for different pairs of text segments in the documents and then gets the total similarity between the documents optimally through optimal matching. We set up experiments of document similarity search to test the effectiveness of the proposed measure. The experimental results and user study demonstrate that the proposed measure outperforms the most popular Cosine measure.