Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

  • Authors:
  • David E. Simmen;Frederick Reiss;Yunyao Li;Suresh Thalamati

  • Affiliations:
  • IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose , CA, USA;IBM Almaden Research Center, San Jose , CA, USA;IBM Almaden Research Center, San Jose , CA, USA

  • Venue:
  • Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations. Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub. We show how to build domain-specific annotators with SystemT's declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.