A collaborative approach to build evaluated web page datasets

  • Authors:
  • Ricardo Barros;José A. Rodrigues Nt.;Geraldo B. Xexéo;Jano M. de Souza

  • Affiliations:
  • COPPE, Graduate School of Engineering, UFRJ-Federal University of Rio de Janeiro, Caixa Postal: 68511 CEP: 21941-972, Rio de Janeiro, Brazil;COPPE, Graduate School of Engineering, UFRJ-Federal University of Rio de Janeiro, Caixa Postal: 68511 CEP: 21941-972, Rio de Janeiro, Brazil;COPPE, Graduate School of Engineering, UFRJ-Federal University of Rio de Janeiro, Caixa Postal: 68511 CEP: 21941-972, Rio de Janeiro, Brazil and Computer Science, Department-Institute of Mathemati ...;COPPE, Graduate School of Engineering, UFRJ-Federal University of Rio de Janeiro, Caixa Postal: 68511 CEP: 21941-972, Rio de Janeiro, Brazil

  • Venue:
  • Future Generation Computer Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Information retrieval algorithms demand datasets to assess their effectiveness. However, access to such datasets is often difficult and expensive, since building them is a time-consuming and costly task. This work presents a collaborative approach to dataset creation that uses a data quality evaluation technique based on fuzzy theory, to assist users in selecting suitable web documents for their datasets. These documents are automatically captured by a crawler and evaluated on information derived from their metadata.