Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA

Authors:
Jan Ziska;Ales Bourek
Affiliations:
-;-
Venue:
TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Year:
2002

Citing 3
Cited 0

Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
TEA: A Text Analysis Tool for the Intelligent Text Document Filtering

TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Automated Selection of Interesting Medical Text Documents by the TEA Text Analyzer

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper describes a text-document-filtering software tool TEA (TExt Analyzer), which was originally developed for physicians to support selections of large numbers of unstructured medical text documents obtained from available Internet services. TEA learns interesting and relevant documents for individual users basically by the na茂ve Bayes algorithm. Moreover, TEA provides a number of additional functions that can improve its classification accuracy, allow more specific document selection for individual users, and enable users to work with dictionaries generated from analyzed documents. The learning process of TEA is based on a set of labeled positive and negative examples of text documents, which obtain their labels from users interested in documents of certain, usually very specific topics.