Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints

  • Authors:
  • Javier Parapar;Álvaro Barreiro

  • Affiliations:
  • IRLab, Computer Science Department, University of A Coruña, Spain;IRLab, Computer Science Department, University of A Coruña, Spain

  • Venue:
  • ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new approach designed to reduce the computational load of the existing clustering algorithms by trimming down the documents size using fingerprinting methods. Thorough evaluation was performed over three different collections and considering four different metrics. The presented approach to document clustering achieved good values of effectiveness with considerable save in memory space and computation time.