A probabilistic model for text kernels

  • Authors:
  • Alain Lehmann;John Shawe-Taylor

  • Affiliations:
  • University of Southampton, United Kingdom;University of Southampton, United Kingdom

  • Venue:
  • ICML '06 Proceedings of the 23rd international conference on Machine learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores several kernels in the context of text classification. A novel view of how documents might have been created is introduced and kernels are derived from this framework. The relations between these kernels as well as to the Gaussian kernel are discussed. Moreover, the popular tf-idf weighting scheme will be derived as a natural consequence. Finally, the kernels have been evaluated on the Reuters Corpus Volume I newswire database to assess their quality in a topic classification application.