Data-Continuous SQL Process Model

  • Authors:
  • Qiming Chen;Meichun Hsu

  • Affiliations:
  • HP Labs, Hewlett Packard Co, Palo Alto, USA;HP Labs, Hewlett Packard Co, Palo Alto, USA

  • Venue:
  • OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Motivated by automating enterprise information derivation processes, we propose a new kind of business process - Data-Continuous SQL Process (DCSP), which is data-stream driven and continuously running. The basic operators of a DCSP are database User Defined Functions (UDFs). However, we introduce a special kind of UDFs - Relation Valued Functions (RVFs) with both input and return values specified as relations. An RVF represents a relational transformation and can be composed with other relational operators. We allow an RVF to be triggered repeatedly by stream inputs , timers or event-conditions. Thesequence of executions generates a data stream . To capture such data continuation semantics we introduce the notion of station for hosting a continuously-executed RVF, and the notion ofpipe as the FIFO stream container for asynchronous communication between stations. A station is specified with the triggering factors and the outgoing pipes. A pipe is strongly typed by a relation schema with a stream key for identifying its elements. As an abstract object, a pipe can be implemented as aqueue or stream table . To allow a DCSP to be constructed from stations and pipes recursively, we introduce the notion of Data Continuous Query (DCQ) that is a query applied to a stream data source --- a stream table, a station (via pipe) or recursively a DCQ, with well defined data continuation semantics. A DCQ itself can be treated as a station, meaning that stations can be constructed from existing ones recursively in terms of SQL. Based on these notions a DCSP is modeled as a graph of stations (nodes) and pipes (links) and represented by a set of correlated DCQs. Specifying DCSP in SQL allows us to take advantage of SQL in expressing relational transformations on the stream elements, and potentially in pushing DCSP execution down to the database layer for performance and scalability. The implementation issues based on parallel database technology are discussed. The proposed approach represents a major shift in process management from one-time execution to data stream driven, open-ended execution, and an initial step in bringing BPM technology and database technology together under the data-continuation semantics.