Using bayesian priors to combine classifiers for adaptive filtering

  • Authors:
  • Yi Zhang

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

An adaptive information filtering system monitors a document stream to identify the documents that match information needs specified by user profiles. As the system filters, it also refines its knowledge about the user's information needs based on long-term observations of the document stream and periodic feedback(training data) from the user. Low variance profile learning algorithms, such as Rocchio, work well at the early stage of filtering when the system has very few training data. Low bias profile learning algorithms, such as Logistic Regression, work well at the later stage of filtering when the system has accumulated enough training data.However, an empirical system needs to works well consistently at all stages of filtering process. This paper addresses this problem by proposing a new technique to combine different text classification algorithms via a constrained maximum likelihood Bayesian prior. This technique provides a trade off between bias and variance, and the combined classifier may achieve a consistent good performance at different stages of filtering. We implemented the proposed technique to combine two complementary classification algorithms: Rocchio and logistic regression. The new algorithm is shown to compare favorably with Rocchio, Logistic Regression, and the best methods in the TREC-9 and TREC-11 adaptive filtering tracks.