Subtopic structuring for full-length document access
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Pivoted document length normalization
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
A vector space model for automatic indexing
Communications of the ACM
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Evaluating strategies for similarity search on the web
Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval
Measuring Structural Similarity Among Web Documents: Preliminary Results
EP '98/RIDT '98 Proceedings of the 7th International Conference on Electronic Publishing, Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography: Electronic Publishing, Artistic Imaging, and Digital Typography
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
An empirical study on retrieval models for different document genres: patents and newspaper articles
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Language Modeling for Information Retrieval
Language Modeling for Information Retrieval
A bag of paths model for measuring structural similarity in Web documents
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
TextTiling: segmenting text into multi-paragraph subtopic passages
Computational Linguistics
Multi-paragraph segmentation of expository text
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
MRSSA: an iterative algorithm for similarity spreading over interrelated objects
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Cohesion and collocation: using context vectors in text segmentation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Better than the real thing?: iterative pseudo-query processing using cluster-based language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
PageRank without hyperlinks: structural re-ranking using links induced by language models
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A study of relevance propagation for web search
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Improving web search results using affinity graph
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Regularizing ad hoc retrieval scores
Proceedings of the 14th ACM international conference on Information and knowledge management
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
PageSim: a novel link-based measure of web page aimilarity
Proceedings of the 15th international conference on World Wide Web
Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Find-similar: similarity browsing as a search tool
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Using random walks for question-focused sentence retrieval
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
LexRank: graph-based lexical centrality as salience in text summarization
Journal of Artificial Intelligence Research
Factors affecting web page similarity
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Re-ranking search results using document-passage graphs
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Utilizing passage-based language models for ad hoc document retrieval
Information Retrieval
Utilizing inter-passage and inter-document similarities for reranking search results
ACM Transactions on Information Systems (TOIS)
IEEE Transactions on Fuzzy Systems
Information Sciences: an International Journal
Which bug should I fix: helping new developers onboard a new project
Proceedings of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering
Semi-supervised SimHash for efficient document similarity search
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Utilizing minimal relevance feedback for ad hoc retrieval
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Manifold-ranking based retrieval using k-regular nearest neighbor graph
Pattern Recognition
Hi-index | 0.00 |
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.