Parallel Monte Carlo algorithms for information retrieval

Authors:
V. N. Alexandrov;I. T. Dimov;A. Karaivanova;C. J. K. Tan
Affiliations:
School of Computer Science, Cybernetics and EE, The University of Reading, Reading RG6 6AY, UK;Central Laboratory for Parallel Processing, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria;Central Laboratory for Parallel Processing, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria;Queens University Belfast, Belfast, UK
Venue:
Mathematics and Computers in Simulation - Special issue: 3rd IMACS seminar on Monte Carlo methods - MCM 2001
Year:
2003

Citing 5
Cited 3

Matrices, Vector Spaces, and Information Retrieval

SIAM Review
Data Mining-Guest Editors' Introduction: From Serendipity to Science

Computer
Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1

Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1
Large-Scale SVD and Subspace-Based Methods for Information Retrieval

IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
Implementation of Monte Carlo Algorithms for Eigenvalue Problem Using MPI

Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

intelligent library and tutoring system for brita in the PuBs project

CDVE'07 Proceedings of the 4th international conference on Cooperative design, visualization, and engineering
Coalescing executions for fast uncertainty analysis

Proceedings of the 33rd International Conference on Software Engineering
White box sampling in uncertain data processing enabled by program analysis

Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k × k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one.The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented.