TEA: A Text Analysis Tool for the Intelligent Text Document Filtering

Authors:
Jan Zizka;Ales Bourek;Ludek Frey
Affiliations:
-;-;-
Venue:
TDS '00 Proceedings of the Third International Workshop on Text, Speech and Dialogue
Year:
2000

Citing 3
Cited 4

Machine Learning

Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning

Classification of Email Queries by Topic: Approach Based on Hierarchically Structured Subject Domain

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Filtering of Large Numbers of Unstructured Text Documents by the Developed Tool TEA

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Automated Selection of Interesting Medical Text Documents by the TEA Text Analyzer

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Topic-specific text filtering based on multiple reducts

AIS-ADM 2005 Proceedings of the 2005 international conference on Autonomous Intelligent Systems: agents and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes results achieved with a text-document classification tool TEA (TExt Analyzer) based on the naïve Bayes algorithm. TEA provides also a set of additional functions, which can assist users at fine-tuning the text classifiers and improving the classification accuracy, mainly through modifications of dictionaries generated during the training phase. Experiments, described in the paper, aimed at supporting work with medical unstructured text documents downloaded from the Internet. Good and stable results (around 97% of the classification accuracy) were achieved for selecting documents in a certain area of interest among a large number of documents from different areas.