Searching for significant word associations in text documents using genetic algorithms

Authors:
Jan Žižka;Michal Šrédl;Aleš Bourek
Affiliations:
Faculty of Informatics, Department of Information Technologies, Masaryk University, Brno, Czech Republic;Faculty of Informatics, Department of Information Technologies, Masaryk University, Brno, Czech Republic;Faculty of Medicine, Department of Biophysics, Masaryk University, Brno, Czech Republic
Venue:
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Year:
2003

Citing 3
Cited 0

Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Automated Selection of Interesting Medical Text Documents by the TEA Text Analyzer

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes some experiments that used Genetic Algorithms (GAs) for looking for important word associations (phrases) in unstructured text documents obtained from the Internet in the area of a specialized medicine. GAs can evolve sets of word associations with assigned significance weights from the document categorization point of view (here two classes: relevant and irrelevant documents). The categorization was similarly reliable like the naïve Bayes method using just individual words; in addition, in this case GAs provided phrases consisting of one, two, or three words. The selected phrases were quite meaningful from the human point of view.