Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Covering indexes for branching path queries
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient processing of XML path queries using the disk-based F&B Index
VLDB '05 Proceedings of the 31st international conference on Very large data bases
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficiently Querying Large XML Data Repositories: A Survey
IEEE Transactions on Knowledge and Data Engineering
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Building a high-level dataflow system on top of Map-Reduce: the Pig experience
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Towards a hybrid row-column database for a cloud-based medical data management system
Proceedings of the 1st International Workshop on Cloud Intelligence
Hi-index | 0.01 |
XML is a more desirable format for modeling and storing clinical data in EMR (Electronic medical record) applications for its extendibility; however, existing EMR systems either are built on top of RDBMS or file systems or lack of support for complex and large scale healthcare applications, such as treatment effectiveness analysis and procedure optimization. SAP Technology Lab, China is developing a clouds-enabled information appliance, Xbase, built on top of Hadoop, which is the first XML-based information appliance designed specifically for large scale and complex healthcare applications. XML presents a different set of challenges for query processing, indexing, parallelism, and distributed computing using existing Hadoop's APIs as well as its HDFS storage infrastructure and MapReduce framework. In this paper, we describe system architecture and internal designs of Xbase as well as how the indexing is mapped to RDBMS and Hadoop. We also discuss why we select Hadoop over other candidates, such as Hbase, Google's Bigtable, and Hive.