HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing

Authors:
Zhijia Zhao;Michael Bebenita;Dave Herman;Jianhua Sun;Xipeng Shen
Affiliations:
College of William and Mary, Williamsburg, VA, USA;Mozilla Corporation, CA, USA;Mozilla Corporation, CA, USA;College of William and Mary, Williamsburg, VA, USA;College of William and Mary, Williamsburg, VA, USA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 30
Cited 0

Parallel parsing on the connection machine

Information Processing Letters
A bibliography on parallel parsing

ACM SIGPLAN Notices
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
An efficient context-free parsing algorithm

Communications of the ACM
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
On parsing context free languages in parallel environments.

On parsing context free languages in parallel environments.
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Context-aware scanning for parsing extensible languages

GPCE '07 Proceedings of the 6th international conference on Generative programming and component engineering
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Online Experiments: Lessons Learned

Computer
An Asynchronous Parallel Interpreter for Arithmetic Expressions and Its Evaluation

IEEE Transactions on Computers
Model, Design, and Evaluation of a Compiler for a Parallel Processing Environment

IEEE Transactions on Software Engineering
Context-free language processing in time n3

SWAT '66 Proceedings of the 7th Annual Symposium on Switching and Automata Theory (swat 1966)
A Parallel Approach to XML Parsing

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
A Data Parallel Algorithm for XML DOM Parsing

XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Fast and parallel webpage layout

Proceedings of the 19th international conference on World wide web
Safe programmable speculative parallelism

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Parallelizing the web browser

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Towards parallelizing the layout engine of firefox

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
SpiceC: scalable parallelism via implicit copying and explicit commit

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
How far can client-only solutions go for mobile browser speed?

Proceedings of the 21st international conference on World Wide Web
Efficient parallel CKY parsing on GPUs

IWPT '11 Proceedings of the 12th International Conference on Parsing Technologies
A case for parallelizing web pages

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Speculative parallelization needs rigor: probabilistic analysis for optimal speculation of finite-state machine applications

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Parallel schedule synthesis for attribute grammars

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
ZOOMM: a parallel web browser engine for multicore mobile devices

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallelizing HTML parsing is challenging due to the complexities of HTML documents and the inherent dependencies in its parsing algorithm. As a result, despite numerous studies in parallel parsing, HTML parsing remains sequential today. It forms one of the final barriers for fully parallelizing browser operations to minimize the browser’s response time—an important variable for user experiences, especially on portable devices. This article provides a comprehensive analysis on the special complexities of parallel HTML parsing and presents a systematic exploration in overcoming those difficulties through specially designed speculative parallelizations. This work develops, to the best of our knowledge, the first pipelining and data-level parallel HTML parsers. The data-level parallel parser, named HPar, achieves up to 2.4× speedup on quadcore devices. This work demonstrates the feasibility of efficient, parallel HTML parsing for the first time and offers a set of novel insights for parallel HTML parsing