Margin-based local regression for adaptive filtering

  • Authors:
  • Yiming Yang;Bryan Kisiel

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Adaptive information filtering is an open challenge in information retrieval. One of the tough issues is the optimization of decision thresholds over time, based on partial relevance feedback on the system-retrieved documents in chronological order. We developed a new approach, namely margin-based local regression, that automatically adjusts the thresholds based on a sliding window over the truly positive examples for which the system predicted "yes" with respect to a particular class, and a second sliding window over the other documents being processed by the system. Using the means of the scores of the documents in the two windows, we monitor the temporal drifting of the margin that is a function of both the current classification model and the threshold calibration strategy, and that suggests the bounds for the optimal threshold at a given time. Examining this approach together with a Rocchio-style classifier on the TREC 2001 and TREC 2002 benchmark data sets in adaptive filtering, we obtained significant improvements in performance (measured using Fβ=0.5) over the baseline system that did not adapt the threshold over time, and the best result ever reported on the TREC 2002 benchmark corpus for adaptive filtering evaluations. These empirical results suggest that it is important to use both system-accepted and system-rejected documents to optimize thresholds instead of just using system-accepted documents alone, as well as to make the thresholding function temporally sensitive to the shifting centroids of on-topic and off-topic documents.