DOCODE-lite: a meta-search engine for document similarity retrieval

Authors:
Felipe Bravo-Marquez;Gaston L'Huillier;Sebastián A. Ríos;Juan D. Velásquez;Luis A. Guerrero
Affiliations:
University of Chile, Department of Industrial Engineering, Santiago, Chile;University of Chile, Department of Industrial Engineering, Santiago, Chile;University of Chile, Department of Industrial Engineering, Santiago, Chile;University of Chile, Department of Industrial Engineering, Santiago, Chile;University of Chile, Department of Computer Science, Santiago, Chile
Venue:
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Year:
2010

Citing 9
Cited 1

A vector space model for automatic indexing

Communications of the ACM
Towards a highly-scalable and effective metasearch engine

Proceedings of the 10th international conference on World Wide Web
Models for metasearch

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Approaches to collection selection and results merging for distributed information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Modern Information Retrieval

Modern Information Retrieval
Towards Automatic Incorporation of Search Engines into a Large-Scale Metasearch Engine

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
Text similarity: an alternative way to search MEDLINE

Bioinformatics
Adaptive Web Sites: A Knowledge Extraction from Web Data Approach - Volume 170 Frontiers in Artificial Intelligence and Applications

Adaptive Web Sites: A Knowledge Extraction from Web Data Approach - Volume 170 Frontiers in Artificial Intelligence and Applications
Retrieving similar documents from the web

Journal of Web Engineering

A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

The retrieval of similar documents from large scale datasets has been the one of the main concerns in knowledge management environments, such as plagiarism detection, news impact analysis, and the matching of ideas within sets of documents. In all of these applications, a light-weight architecture can be considered as fundamental for the large scale of information needed to be analyzed. Furthermore, the relevance score for documents retrieval can be significantly improved using several previously built search engines and taking into account the relevance feedback from users. In this work, we propose a web-services architecture for the retrieval of similar documents from the web. We focus on software engineering to support the manipulation of users' knowledge into the retrieval algorithm. An human evaluation for the relevance feedback of the system over a built set of documents is presented, showing that the proposed architecture can retrieve similar documents by using the main search engines. In particular, the document plagiarism detection task was evaluated, for which its main results are shown.