Dynamic Load Balancing Model: Preliminary Results for Parallel Pseudo-search Engine Indexers/Crawler Mechanisms Using MPI and Genetic Programming

  • Authors:
  • Reginald L. Walker

  • Affiliations:
  • -

  • Venue:
  • VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Methodologies derived from Genetic Programming (GP) and Knowledge Discovery in Databases (KDD) were used in the parallel implementation of the indexer simulator to emulate the current World Wide Web (WWW) search engine indexers. This indexer followed the indexing strategies that were employed by Alta Vista and Inktomi that index each word in each Web document. The insights gained from the initial implementation of this simulator have resulted in the initial phase of the adaption of a biological model. The biological model will offer a basis for future developments associated withan integrated Pseudo-Search Engine. The basic characteristics exhibited by the model will be translated so as to develop a model of an integrated search engine using GP. The evolutionary processes exhibited by this biological model will not only provide mechanisms for the storage, processing, and retrieval of valuable information but also for Web crawlers, as well as for an advanced communication system. The current Pseudo-Search Engine Indexer, capable of organizing limited subsets of Web documents, provides a foundation for the first simulator of this model. Adaptation of the model for the refinement of the Pseudo-Search Engine establishes order in the inherent interactions between the indexer, crawler and browser mechanisms by including the social (hierarchical) structure and simulated behavior of this complex system. The simulation of behavior will engender mechanisms that are controlled and coordinated in their various levels of complexity. This unique model will also provide a foundation for an evolutionary expansion of the search engine as WWW documents continue to grow. The simulator results were generated using Message Passing Interface (MPI) on a network of SUN workstations and an IBM SP2 computer system.