Searching for significant word associations in text documents using genetic algorithms

  • Authors:
  • Jan Žižka;Michal Šrédl;Aleš Bourek

  • Affiliations:
  • Faculty of Informatics, Department of Information Technologies, Masaryk University, Brno, Czech Republic;Faculty of Informatics, Department of Information Technologies, Masaryk University, Brno, Czech Republic;Faculty of Medicine, Department of Biophysics, Masaryk University, Brno, Czech Republic

  • Venue:
  • CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes some experiments that used Genetic Algorithms (GAs) for looking for important word associations (phrases) in unstructured text documents obtained from the Internet in the area of a specialized medicine. GAs can evolve sets of word associations with assigned significance weights from the document categorization point of view (here two classes: relevant and irrelevant documents). The categorization was similarly reliable like the naïve Bayes method using just individual words; in addition, in this case GAs provided phrases consisting of one, two, or three words. The selected phrases were quite meaningful from the human point of view.