Batch document filtering using nearest neighbor algorithm

Authors:
Ali Mustafa Qamar;Eric Gaussier;Nathalie Denos
Affiliations:
Laboratoire d'Informatique de Grenoble and Université Joseph Fourier;Laboratoire d'Informatique de Grenoble and Université Joseph Fourier;Laboratoire d'Informatique de Grenoble and Université Pierre Mendès France
Venue:
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Year:
2009

Citing 3
Cited 1

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Similarity Learning for Nearest Neighbor Classification

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Overview of CLEF 2008 INFILE pilot track

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

Information filtering evaluation: overview of CLEF 2009 INFILE track

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose in this paper a batch algorithm to learn category specific thresholds in a multiclass environment where a document can belong to more than one class. The algorithm uses the k-nearest neighbor algorithm for filtering the 100,000 documents into 50 profiles. The experiments were run on the English corpus. Our experiments gave us a macro precision of 0.256 while the macro recall was 0.295. We had participated in the online task in INFILE 2008 where we had used an online algorithm using the feedbacks from the server. In comparison with INFILE 2008, the macro recall is significantly better in 2009, 0.295 vs 0.260. However the macro precision in 2008 were 0.306. Furthermore, the anticipation in 2009 was 0.43 as compared with 0.307 in 2008. We have also provided a detailed comparison between the batch and online algorithms.