Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA

  • Authors:
  • Jan Ziska;Ales Bourek

  • Affiliations:
  • -;-

  • Venue:
  • TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper describes a text-document-filtering software tool TEA (TExt Analyzer), which was originally developed for physicians to support selections of large numbers of unstructured medical text documents obtained from available Internet services. TEA learns interesting and relevant documents for individual users basically by the na茂ve Bayes algorithm. Moreover, TEA provides a number of additional functions that can improve its classification accuracy, allow more specific document selection for individual users, and enable users to work with dictionaries generated from analyzed documents. The learning process of TEA is based on a set of labeled positive and negative examples of text documents, which obtain their labels from users interested in documents of certain, usually very specific topics.