An analysis of XML binary formats and compression

  • Authors:
  • Christopher J. Augeri;Barry E. Mullins;Dursun A. Bulutoglu;Rusty O. Baldwin;Leemon C. Baird, III

  • Affiliations:
  • Department of Electrical and Computer Engineering, Air Force Institute of Technology (AFIT), Wright Patterson Air Force Base, OH;Department of Electrical and Computer Engineering, Air Force Institute of Technology (AFIT), Wright Patterson Air Force Base, OH;Department of Mathematics and Statistics, Air Force Institute of Technology (AFIT), Wright Patterson Air Force Base, OH;Department of Electrical and Computer Engineering, Air Force Institute of Technology (AFIT), Wright Patterson Air Force Base, OH;Department of Computer Science, United States Air Force Academy (USAFA), United States Air Force Academy, CO

  • Venue:
  • ecs'07 Experimental computer science on Experimental computer science
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML simplifies data exchange amongst disparate computers, but is notoriously verbose and has spawned development of a variety of XML compressors and binary formats. Some formats allow streaming access to the data without complete decompression. We present an XML test file corpus, akin to corpora such as the Canterbury corpus and a combined efficiency metric integrating compression ratio and speed. We then use the test corpus to assess 14 general-purpose and XML-specific compressors against the efficiency and other metrics. After constructing linear regression models, we identify the factors influencing compressor selection and then rank the best-performing compressors.