Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT

Authors:
David E. Simmen;Frederick Reiss;Yunyao Li;Suresh Thalamati
Affiliations:
IBM Almaden Research Center, San Jose, CA, USA;IBM Almaden Research Center, San Jose , CA, USA;IBM Almaden Research Center, San Jose , CA, USA;IBM Almaden Research Center, San Jose , CA, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 4
Cited 1

Enterprise information mashups: integrating information, simply

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Damia: data mashups for intranet applications

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SystemT: a system for declarative information extraction

ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Building ranked mashups of unstructured sources with uncertain information

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Enterprise mashup scenarios often involve feeds derived from data created primarily for eye consumption, such as email, news, calendars, blogs, and web feeds. These data sources can test the capabilities of current data mashup products, as the attributes needed to perform join, aggregation, and other operations are often buried within unstructured feed text. Information extraction technology is a key enabler in such scenarios, using annotators to convert unstructured text into structured information that can facilitate mashup operations. Our demo presents the integration of SystemT, an information extraction system from IBM Research, with IBM's InfoSphere MashupHub. We show how to build domain-specific annotators with SystemT's declarative rule language, AQL, and how to use these annotators to combine structured and unstructured information in an enterprise mashup.