Characterizing large text corpora using a maximum variation sampling genetic algorithm

Authors:
Robert M. Patton;Thomas E. Potok
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, TN;Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Year:
2006

Citing 3
Cited 1

Solving combinatorial optimization problems using parallel simulated annealing and parallel genetic algorithms

SAC '92 Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing: technological challenges of the 1990's
Parallel Genetic Algorithms Population Genetics and Combinatorial Optimization

Proceedings of the 3rd International Conference on Genetic Algorithms
Distributed genetic algorithms for function optimization

Distributed genetic algorithms for function optimization

Analysis of mammography reports using maximum variation sampling

Proceedings of the 10th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

There exists an enormous amount of information available via the Internet. Much of this data is in the form of text-based documents. These documents cover a variety of topics that are vitally important to the scientific, business, and defense/security communities. Currently, there are a many techniques for processing and analyzing such data. However, the ability to quickly characterize a large set of documents still proves challenging. Previous work has successfully demonstrated the use of a genetic algorithm for providing a representative subset for text documents via adaptive sampling. In this work, we further expand and explore this approach on much larger data sets using a parallel Genetic Algorithm (GA) with adaptive parameter control. Experimental results are presented and discussed.