Batch document filtering using nearest neighbor algorithm

  • Authors:
  • Ali Mustafa Qamar;Eric Gaussier;Nathalie Denos

  • Affiliations:
  • Laboratoire d'Informatique de Grenoble and Université Joseph Fourier;Laboratoire d'Informatique de Grenoble and Université Joseph Fourier;Laboratoire d'Informatique de Grenoble and Université Pierre Mendès France

  • Venue:
  • CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose in this paper a batch algorithm to learn category specific thresholds in a multiclass environment where a document can belong to more than one class. The algorithm uses the k-nearest neighbor algorithm for filtering the 100,000 documents into 50 profiles. The experiments were run on the English corpus. Our experiments gave us a macro precision of 0.256 while the macro recall was 0.295. We had participated in the online task in INFILE 2008 where we had used an online algorithm using the feedbacks from the server. In comparison with INFILE 2008, the macro recall is significantly better in 2009, 0.295 vs 0.260. However the macro precision in 2008 were 0.306. Furthermore, the anticipation in 2009 was 0.43 as compared with 0.307 in 2008. We have also provided a detailed comparison between the batch and online algorithms.