A Scalable and Efficient Probabilistic Information Retrieval and Text Mining System

Authors:
Magnus Stensmo
Affiliations:
-
Venue:
ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Year:
2002

Citing 3
Cited 1

A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval

Information Retrieval
Machine Learning

Machine Learning

Factor matrix text filtering and clustering: Research Articles

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A system for probabilistic information retrieval and text mining that is both scalable and efficient is presented. Separate feature extraction or stop-word lists are not needed since the system can remove unneeded parameters dynamically based on a local mutual information measure. This is shown to be as effective as using a global measure. A novel way ofstoring system parameters eliminates the need for a ranking step during information retrieval from queries. Probability models over word contexts provide a method to suggest related words that can be added to a query. Test results are presented on a categorization task and screen shots from a live system are shown to demonstrate its capabilities.