Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Hi-index | 0.10 |
We propose a probabilistic expression of PSI (Polynomial Semantic Indexing). PSI is a model which represents a latent semantic space in the polynomial form of input vectors. PSI express high-order relationships between more than two vectors in the form of extended inner products. PSI employs the low rank representation, which enables us to treat high-dimensional data without processes such as dimension reduction and feature extraction explicitly. Our proposed pPSI also has the same advantages as PSI. The contribution of this paper is (1) to formulate a probabilistic expression of PSI (pPSI), (2) to propose a pPSI-based classifier, and (3) to show a possibility of the pPSI classifier. The training algorithm of the stochastic gradient descendent for pPSI is introduced, saving memory use as well as computational costs. Furthermore, pPSI has a potential to reach the better solution compared to PSI. The proposed pPSI method can perform model-based training and adaptation, such as MAP (Maximum A Posterior)-based estimation according to the amount of data. In order to evaluate pPSI and its classifier, we conducted three experiments with artificial data and music data, comparing with multi-class SVM and boosting classifiers. Through the experiments, it is shown that the proposed method is feasible, especially for the case of small dimension of latent concept spaces.