World Wide Web Journal - Special issue on XML: principles, tools, and techniques
On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Introduction to algorithms
Storing and querying ordered XML using a relational database system
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
Maintaining order in a linked list
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
ORDPATHs: insert-friendly XML node labels
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
System RX: one part relational, one part XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Towards an enterprise XML architecture
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
XMark: a benchmark for XML data management
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A Parallel Approach to XML Parsing
GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
A Data Parallel Algorithm for XML DOM Parsing
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
XQuery Full Text Implementation in BaseX
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Parallel data processing with MapReduce: a survey
ACM SIGMOD Record
SkewTune: mitigating skew in mapreduce applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
HadoopXML: a suite for parallel processing of massive XML data with multiple twig pattern queries
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The volume of XML data has become enormous and still grows very quickly as many data have been typed in XML by virtue of its simplicity and extensibility. While a tree labeling algorithm has a crucial role in XML query processing, conventional algorithms are all sequential so that they fail to label a large volume of XML data in a timely manner. To address this issue, we devise parallel tree labeling algorithms for massive XML data. Specifically, we focus on how to efficiently label a single large XML file in parallel. We first propose parallel versions of two prominent tree labeling schemes based on the MapReduce framework. We then present techniques for runtime workload balancing and data repartition to solve performance issues caused by data skewness and MapReduce's inherited limitation. Through extensive experiments with synthetic and real-world datasets on 15 nodes, we show that our parallel labeling algorithms are up to 17 times faster than conventional algorithms, providing strong durability against data skewness.