Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Exploring the similarity space
ACM SIGIR Forum
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Earth Mover's Distance as a Metric for Image Retrieval
International Journal of Computer Vision
Perceptual Metrics for Image Database Navigation
Perceptual Metrics for Image Database Navigation
Empirical evaluation of dissimilarity measures for color and texture
Computer Vision and Image Understanding - Special issue on empirical evaluation of computer vision algorithms
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
An Efficient File Structure for Document Retrieval in the Automated Office Environment
IEEE Transactions on Knowledge and Data Engineering
The Journal of Machine Learning Research
An Evaluation of Passage-Based Text Categorization
Journal of Intelligent Information Systems
A novel document retrieval method using the discrete wavelet transform
ACM Transactions on Information Systems (TOIS)
The rate adapting poisson model for information retrieval and object recognition
ICML '06 Proceedings of the 23rd international conference on Machine learning
A scaleable document clustering approach for large document corpora
Information Processing and Management: an International Journal
Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)
IEEE Transactions on Dependable and Secure Computing
Language model-based document clustering using random walks
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distributional Features for Text Categorization
IEEE Transactions on Knowledge and Data Engineering
A new dual wing harmonium model for document retrieval
Pattern Recognition
Multilayer SOM with tree-structured data for efficient document retrieval and plagiarism detection
IEEE Transactions on Neural Networks
Hi-index | 12.05 |
This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover's Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms.