Storing and querying ordered XML using a relational database system
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
XML parsing: a threat to database performance
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Service-Oriented Computing: Key Concepts and Principles
IEEE Internet Computing
XML screamer: an integrated approach to high performance XML parsing, validation and deserialization
Proceedings of the 15th international conference on World Wide Web
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A High Performance Schema-Specific XML Parser
E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
A framework for service-oriented computing with C and C++ Web service components
ACM Transactions on Internet Technology (TOIT)
High performance XML parsing using parallel bit stream technology
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
CoDeSe: fast deserialization via code generation
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Parallel scanning with bitstream addition: an XML case study
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
XLynx—An FPGA-based XML filter for hybrid XQuery processing
ACM Transactions on Database Systems (TODS) - Invited papers issue
HPar: A practical parallel parser for HTML--taming HTML complexities for parallel parsing
ACM Transactions on Architecture and Code Optimization (TACO)
Parallel labeling of massive XML data with MapReduce
The Journal of Supercomputing
Hi-index | 0.00 |
The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM has two phases. In the first phase, an XML document is partitioned into chunks and parsed in parallel. In the second phase, partial DOM node tree structures created during the first phase, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme --- each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. Through empirical evaluation, we show that ParDOM yields better scalability than PXP [23] --- a recently proposed parallel DOM parsing algorithm --- on commodity multicore processors. Furthermore, ParDOM can process a wide-variety of XML datasets with complex structures which PXP fails to parse.