An adaptive, fast, and safe XML parser based on byte sequences memorization

  • Authors:
  • Toshiro Takase;Hisashi MIYASHITA;Toyotaro Suzumura;Michiaki Tatsubori

  • Affiliations:
  • IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan

  • Venue:
  • WWW '05 Proceedings of the 14th international conference on World Wide Web
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML (Extensible Markup Language) processing can incur significant runtime overhead in XML-based infrastructural middleware such as Web service application servers. This paper proposes a novel mechanism for efficiently processing similar XML documents. Given a new XML document as a byte sequence, the XML parser proposed in this paper normally avoids syntactic analysis but simply matches the document with previously processed ones, reusing those results. Our parser is adaptive since it partially parses and then remembers XML document fragments that it has not met before. Moreover, it processes safely since its partial parsing correctly checks the well-formedness of documents. Our implementation of the proposed parser complies with the JSR 63 standard of the Java API for XML Processing (JAXP) 1.1 specification. We evaluated Deltarser performance with messages using Google Web services. Comparing to Piccolo (and Apache Xerces), it effectively parses 35% (106%) faster in a server-side use-case scenario, and 73% (126%) faster in a client-side use-case scenario.