Achieving application requirements
Distributed systems
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
New algorithm for ordered tree-to-tree correction problem
Journal of Algorithms
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Algorithms on Trees and Graphs
Algorithms on Trees and Graphs
Exploiting hierarchical domain structure to compute similarity
ACM Transactions on Information Systems (TOIS)
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Term Weighting Approaches in Automatic Text Retrieval
Term Weighting Approaches in Automatic Text Retrieval
The earth mover's distance as a semantic measure for document similarity
Proceedings of the 14th ACM international conference on Information and knowledge management
A web-based kernel function for measuring the similarity of short text snippets
Proceedings of the 15th international conference on World Wide Web
Detecting similar Java classes using tree algorithms
Proceedings of the 2006 international workshop on Mining software repositories
Concept-Based Document Recommendations for CiteSeer Authors
AH '08 Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Mining Hidden Concepts for Ontology Extension Using Multivariate Probabilistic Modeling
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Measuring similarity of chinese web databases based on category hierarchy
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
An event-centric model for multilingual document similarity
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
ImpactWheel: Visual Analysis of the Impact of Online News
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Indexing for subtree similarity-search using edit distance
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Near duplicate detection in an academic digital library
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
The Web is quickly moving from the era of search engines to the era of discovery engines. Whereas search engines help you find information you are looking for, discovery engines help you find things that you never knew existed. A common discovery technique is to automatically identify and display objects similar to ones previously viewed by the user. Core to this approach is an accurate method to identify similar documents. In this paper, we present a new approach to identifying similar documents based on a conceptual tree-similarity measure. We represent each document as a concept tree using the concept associations obtained from a classifier. Then, we make employ a tree-similarity measure based on a tree edit distance to compute similarities between concept trees. Experiments on documents from the CiteSeer collection showed that our algorithm performed significantly better than document similarity based on the traditional vector space model.