An adaptive, fast, and safe XML parser based on byte sequences memorization

Authors:
Toshiro Takase;Hisashi MIYASHITA;Toyotaro Suzumura;Michiaki Tatsubori
Affiliations:
IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan;IBM Tokyo Research Laboratory, Yamato, Kanagawa, Japan
Venue:
WWW '05 Proceedings of the 14th international conference on World Wide Web
Year:
2005

Citing 10
Cited 7

Potential benefits of delta encoding and data compression for HTTP

SIGCOMM '97 Proceedings of the ACM SIGCOMM '97 conference on Applications, technologies, architectures, and protocols for computer communication
A protocol-independent technique for eliminating redundant network traffic

Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Efficient wire formats for high performance computing

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Requirements for and evaluation of RMI protocols for scientific computing

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Compactly encoding unstructured inputs with differential compression

Journal of the ACM (JACM)
Engineering a Differencing and Compression Data Format

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Evaluating Web Services Based Implementations of GridRPC

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Investigating the Limits of SOAP Performance for Scientific Computing

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Latency Performance of SOAP Implementations

CCGRID '02 Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid
A transducer-based XML query processor

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

XML screamer: an integrated approach to high performance XML parsing, validation and deserialization

Proceedings of the 15th international conference on World Wide Web
Generation of efficient parsers through direct compilation of XML schema grammars

IBM Systems Journal
Benefits of alternate XML serialization formats in scientific computing

Proceedings of the 2007 workshop on Service-oriented computing performance: aspects, issues, and approaches
XML messaging for mobile devices: From requirements to implementation

Computer Networks: The International Journal of Computer and Telecommunications Networking
Parsing XML using parallel traversal of streaming trees

HiPC'08 Proceedings of the 15th international conference on High performance computing
Optimizing differential XML processing by leveraging schema and statistics

ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing
Optimized web services security performance with differential parsing

ICSOC'06 Proceedings of the 4th international conference on Service-Oriented Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML (Extensible Markup Language) processing can incur significant runtime overhead in XML-based infrastructural middleware such as Web service application servers. This paper proposes a novel mechanism for efficiently processing similar XML documents. Given a new XML document as a byte sequence, the XML parser proposed in this paper normally avoids syntactic analysis but simply matches the document with previously processed ones, reusing those results. Our parser is adaptive since it partially parses and then remembers XML document fragments that it has not met before. Moreover, it processes safely since its partial parsing correctly checks the well-formedness of documents. Our implementation of the proposed parser complies with the JSR 63 standard of the Java API for XML Processing (JAXP) 1.1 specification. We evaluated Deltarser performance with messages using Google Web services. Comparing to Piccolo (and Apache Xerces), it effectively parses 35% (106%) faster in a server-side use-case scenario, and 73% (126%) faster in a client-side use-case scenario.