A "Bag" or a "Window" of Words for Information Filtering?

  • Authors:
  • Nikolaos Nanas;Manolis Vavalis

  • Affiliations:
  • Centre for Research and Technology - Thessaly (CE.RE.TE.TH), Greece;Centre for Research and Technology - Thessaly (CE.RE.TE.TH), Greece

  • Venue:
  • SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Treating documents as bag of words is the norm in Information Filtering. Syntactic and semantic correlations between terms are ignored, or in other words, term independence is assumed. In this paper we challenge this common assumption. We use Nootropia, a user profiling model that uses a sliding window approach to capture term dependencies in a network and a spreading activation process to take them into account for document evaluation. Experiments performed based on TREC's routing guidelines demonstrate that given an adequate window size the additional information that term dependencies encode, results in improved filtering performance over a traditional bag of words approach.