Modern Information Retrieval
Using web helper agent profiles in query generation
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Web Caching And Its Applications (Kluwer International Series in Engineering and Computer Science)
Web Caching And Its Applications (Kluwer International Series in Engineering and Computer Science)
Finding near-duplicate web pages: a large-scale evaluation of algorithms
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Retrieving similar documents from the web
Journal of Web Engineering
Hypergeometric language models for republished article finding
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
A zipf-like distant supervision approach for multi-document summarization using wikinews articles
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
The retrieval of similar documents in the Web from a given document is different in many aspects from information retrieval based on queries generated by regular search engine users. In this work, a new method is proposed for Web similarity document retrieval based on generative language models and meta search engines. Probabilistic language models are used as a random query generator for the given document. Queries are submitted to a customizable set of Web search engines. Once all results obtained are gathered, its evaluation is determined by a proposed scoring function based on the Zipf law. Results obtained showed that the proposed methodology for query generation and scoring procedure solves the problem with acceptable levels of precision.