TSP and cluster-based solutions to the reassignment of document identifiers

  • Authors:
  • Roi Blanco;Álvaro Barreiro

  • Affiliations:
  • IRLab. Computer Science Department, University of Corunna, Spain;IRLab. Computer Science Department, University of Corunna, Spain

  • Venue:
  • Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent studies demonstrated that it is possible to reduce Inverted Files (IF) sizes by reassigning the document identifiers of the original collection, as this lowers the distance between the positions of documents related to a single term. Variable-bit encoding schemes can exploit the average gap reduction and decrease the total amount of bits per document pointer. This paper presents an efficient solution to the reassignment problem, which consists in reducing the input data dimensionality using a SVD transformation, as well as considering it a Travelling Salesman Problem (TSP). We also present some efficient solutions based on clustering. Finally, we combine both the TSP and the clustering strategies for reordering the document identifiers. We present experimental tests and performance results in two text TREC collections, obtaining good compression ratios with low running times, and advance the possibility of obtaining scalable solutions for web collections based on the techniques presented here.