Supervised Dual-PLSA for Personalized SMS Filtering

  • Authors:
  • Wei-Ran Xu;Dong-Xin Liu;Jun Guo;Yi-Chao Cai;Ri-Le Hu

  • Affiliations:
  • School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China 100876;School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China 100876;School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China 100876;School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China 100876;Nokia Research Center, Beijing, China 100176

  • Venue:
  • AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because users hardly have patience of affording enough labeled data, personalized filter is expected to converge much faster. Topic model based dimension reduction can minimize the structural risk with limited training data. In this paper, we propose a novel supervised dual-PLSA which estimate topics with many kinds of observable data, i.e. labeled and unlabeled documents, supervised information about topics. c -w PLSA model is first proposed, in which word and class are observable variables and topic is latent. Then, two generative models, c -w PLSA and typical PLSA, are combined to share observable variables in order to utilize other observed data. Furthermore, supervised information about topic is employed. This is supervised dual-PLSA. Experiments show the dual-PLSA has a very fast convergence. Within 100 gold standard feedback, dual-PLSA's cumulative error rate drops to 9%. Its total error rate is 6.94%, which is the lowest among all the filters.