Volcano An Extensible and Parallel Query Evaluation System
IEEE Transactions on Knowledge and Data Engineering
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SystemT: a system for declarative information extraction
ACM SIGMOD Record
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Ricardo: integrating R and Hadoop
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Midas: integrating public financial data
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Emerging trends in the enterprise data analytics: connecting Hadoop and DB2 warehouse
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
CoHadoop: flexible data placement and its exploitation in Hadoop
Proceedings of the VLDB Endowment
SystemML: Declarative machine learning on MapReduce
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Large-scale matrix factorization with distributed stochastic gradient descent
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
FLEX: a slot allocation scheduling optimizer for MapReduce workloads
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Declarative error management for robust data-intensive applications
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
Same Queries, Different Data: Can We Predict Runtime Performance?
ICDEW '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering Workshops
Run-time performance optimization of a BigData query language
Proceedings of the 5th ACM/SPEC international conference on Performance engineering
Hi-index | 0.00 |
With the rapid increase in the volume of data that enterprises are producing, enterprises are adopting large-scale data processing platforms such as Hadoop® to store, manage, and run deep analytics to gain actionable insights from their "big data." At IBM Research - Almaden, we have been helping enterprise customers build solutions exploiting data-intensive analytics. Our deep experience with actual users has led to an extensive understanding of the platform requirements needed to support these solutions, and our goal is to provide a powerful analytics platform, which we call eXtreme Analytics Platform (XAP), that can be used to create solutions for customer problems that have not been economically feasible to solve until now. XAP provides Jaql [i.e., JavaScript® Object Notation (JSON) query language, a scripting language to specify data flows, tools, and techniques to optimize the runtime execution of these flows], an improved task scheduler, connectors to data warehouses, and libraries for advanced analytics. Many of these technologies have been transferred to the IBM InfoSphere BigInsights™ product. In this paper, we describe the overall design principles and technology of XAP.