A Scalable and Efficient Probabilistic Information Retrieval and Text Mining System

  • Authors:
  • Magnus Stensmo

  • Affiliations:
  • -

  • Venue:
  • ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A system for probabilistic information retrieval and text mining that is both scalable and efficient is presented. Separate feature extraction or stop-word lists are not needed since the system can remove unneeded parameters dynamically based on a local mutual information measure. This is shown to be as effective as using a global measure. A novel way ofstoring system parameters eliminates the need for a ranking step during information retrieval from queries. Probability models over word contexts provide a method to suggest related words that can be added to a query. Test results are presented on a categorization task and screen shots from a live system are shown to demonstrate its capabilities.