Distributed XML processing: Theory and applications

  • Authors:
  • Dirceu Cavendish;K. Selçuk Candan

  • Affiliations:
  • Kyushu Institute of Technology, 3-8-1 Asano, Kokurakita-Ku, Kitakyushu, Fukuoka 802-0001, Japan;School of Computing and Informatics, Computer Science and Engineering Department, Ira Fulton School of Engineering, Arizona State University, Tempe, AZ 85287-5406, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Basic message processing tasks, such as well-formedness checking and grammar validation, common in Web service messaging, can be off-loaded from the service providers' own infrastructures. The traditional ways to alleviate the overhead caused by these tasks is to use firewalls and gateways. However, these single processing point solutions do not scale well. To enable effective off-loading of common processing tasks, we introduce the Prefix Automata SyStem - PASS, a middleware architecture which distributively processes XML payloads of web service SOAP messages during their routing towards Web servers. PASS is based on a network of automata, where PASS-nodes independently but cooperatively process parts of the SOAP message XML payload. PASS allows autonomous and pipelined in-network processing of XML documents, where parts of a large message payload are processed by various PASS-nodes in tandem or simultaneously. The non-blocking, non-wasteful, and autonomous operation of PASS middleware is achieved by relying on the prefix nature of basic XML processing tasks, such as well-formedness checking and DTD validation. These properties ensure minimal distributed processing management overhead. We present necessary and sufficient conditions for outsourcing XML document processing tasks to PASS, as well as provide guidelines for rendering suitable applications to be PASS processable. We demonstrate the advantages of migrating XML document processing, such as well-formedness checking, DTD parsing, and filtering to the network via event driven simulations.