Moving Text Analysis Tools to the Cloud

Authors:
Himanshu Vashishtha;Michael Smit;Eleni Stroulia
Affiliations:
-;-;-
Venue:
SERVICES '10 Proceedings of the 2010 6th World Congress on Services
Year:
2010

Citing 0
Cited 1

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text analysis is an important computational task, as unstructured data including text abound and can potentially provide interesting information and knowledge in a variety of areas. In our collaboration with Digital Humanists, we have started to examine the opportunities that the cloud offers to improving the response times of text-analysis tools so that users can comparatively analyze large text corpora across a variety of dimensions. To that end, we have started migrating existing text analysis tools to the cloud, beginning with TAPoR, the Text Analysis Portal for Research. In this paper, we discuss our experience redesigning and re-implementing four basic TAPoR operations on Hadoop and we report on the performance improvements enabled by the migration.