HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing

  • Authors:
  • Zhijia Zhao;Michael Bebenita;Dave Herman;Jianhua Sun;Xipeng Shen

  • Affiliations:
  • College of William and Mary, Williamsburg, VA, USA;Mozilla Corporation, CA, USA;Mozilla Corporation, CA, USA;College of William and Mary, Williamsburg, VA, USA;College of William and Mary, Williamsburg, VA, USA

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallelizing HTML parsing is challenging due to the complexities of HTML documents and the inherent dependencies in its parsing algorithm. As a result, despite numerous studies in parallel parsing, HTML parsing remains sequential today. It forms one of the final barriers for fully parallelizing browser operations to minimize the browser’s response time—an important variable for user experiences, especially on portable devices. This article provides a comprehensive analysis on the special complexities of parallel HTML parsing and presents a systematic exploration in overcoming those difficulties through specially designed speculative parallelizations. This work develops, to the best of our knowledge, the first pipelining and data-level parallel HTML parsers. The data-level parallel parser, named HPar, achieves up to 2.4× speedup on quadcore devices. This work demonstrates the feasibility of efficient, parallel HTML parsing for the first time and offers a set of novel insights for parallel HTML parsing