A scalable and distributed NLP architecture for web document annotation

  • Authors:
  • Julien Deriviere;Thierry Hamon;Adeline Nazarenko

  • Affiliations:
  • LIPN – UMR CNRS 7030, Villetaneuse, France;LIPN – UMR CNRS 7030, Villetaneuse, France;LIPN – UMR CNRS 7030, Villetaneuse, France

  • Venue:
  • FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the context of the ALVIS project, which aims at integrating linguistic information in topic-specific search engines, we develop a NLP architecture to linguistically annotate large collections of web documents. This context leads us to face the scalability aspect of Natural Language Processing. The platform can be viewed as a framework using existing NLP tools. We focus on the efficiency of the platform by distributing linguistic processing on several machines. We carry out an an experiment on 55,329 web documents focusing on biology. These 79 million-word collections of web documents have been processed in 3 days on 16 computers.