A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Machine Learning
Factor matrix text filtering and clustering: Research Articles
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
A system for probabilistic information retrieval and text mining that is both scalable and efficient is presented. Separate feature extraction or stop-word lists are not needed since the system can remove unneeded parameters dynamically based on a local mutual information measure. This is shown to be as effective as using a global measure. A novel way ofstoring system parameters eliminates the need for a ranking step during information retrieval from queries. Probability models over word contexts provide a method to suggest related words that can be added to a query. Test results are presented on a categorization task and screen shots from a live system are shown to demonstrate its capabilities.