Efficient storage and retrieval of probabilistic latent semantic information for information retrieval

Authors:
Laurence A. Park;Kotagiri Ramamohanarao
Affiliations:
Department of Computer Science and Software Engineering, ARC Centre for Perceptive and Intelligent Machines in Complex Environments, The University of Melbourne, Melbourne, Australia;Department of Computer Science and Software Engineering, ARC Centre for Perceptive and Intelligent Machines in Complex Environments, The University of Melbourne, Melbourne, Australia
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2009

Citing 18
Cited 7

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
A general language model for information retrieval

Proceedings of the eighth international conference on Information and knowledge management
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
A probabilistic model of information retrieval: development and comparative experiments

Information Processing and Management: an International Journal
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Variational Extensions to EM and Multinomial PCA

ECML '02 Proceedings of the 13th European Conference on Machine Learning
On an equivalence between PLSI and LDA

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Implementation of the SMART Information Retrieval System

Implementation of the SMART Information Retrieval System
Latent dirichlet allocation

The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Hybrid Pre-Query Term Expansion using Latent Semantic Analysis

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A probabilistic model for Latent Semantic Indexing: Research Articles

Journal of the American Society for Information Science and Technology
Why spectral retrieval works

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Relation between PLSA and NMF and implications

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Query expansion using a collection dependent probabilistic latent semantic thesaurus

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
A neural network for text representation

ICANN'05 Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II

The Effect of Weighted Term Frequencies on Probabilistic Latent Semantic Term Relationships

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
The Sensitivity of Latent Dirichlet Allocation for Information Retrieval

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Kernel latent semantic analysis using an information retrieval based kernel

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient Probabilistic Latent Semantic Analysis through Parallelization

AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Word AdHoc Network: Using Google Core Distance to extract the most relevant information

Knowledge-Based Systems
Using latent topics to enhance search and recommendation in Enterprise Social Software

Expert Systems with Applications: An International Journal
Approximate document outlier detection using random spectral projection

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationships from a document set. The probabilistic latent semantic index (PLSI) has been used to store PLSA information, but unfortunately the PLSI uses excessive storage space relative to a simple term frequency index, which causes lengthy query times. To overcome the storage and speed problems of PLSI, we introduce the probabilistic latent semantic thesaurus (PLST); an efficient and effective method of storing the PLSA information. We show that through methods such as document thresholding and term pruning, we are able to maintain the high precision results found using PLSA while using a very small percent (0.15%) of the storage space of PLSI.