Moving Text Analysis Tools to the Cloud

  • Authors:
  • Himanshu Vashishtha;Michael Smit;Eleni Stroulia

  • Affiliations:
  • -;-;-

  • Venue:
  • SERVICES '10 Proceedings of the 2010 6th World Congress on Services
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text analysis is an important computational task, as unstructured data including text abound and can potentially provide interesting information and knowledge in a variety of areas. In our collaboration with Digital Humanists, we have started to examine the opportunities that the cloud offers to improving the response times of text-analysis tools so that users can comparatively analyze large text corpora across a variety of dimensions. To that end, we have started migrating existing text analysis tools to the cloud, beginning with TAPoR, the Text Analysis Portal for Research. In this paper, we discuss our experience redesigning and re-implementing four basic TAPoR operations on Hadoop and we report on the performance improvements enabled by the migration.