Toward a scale-out data-management middleware for low-latency enterprise computing

Authors:
L. L. Fong;Y. Gao;X. R. Guerin;Y. G. Liu;T. Salo;S. R. Seelam;W. Tan;S. Tata
Affiliations:
IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Software Group, Application and Integration Middleware Software, Emerging Technology Institute, Durham, NC;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Almaden Research Center, San Jose, CA
Venue:
IBM Journal of Research and Development
Year:
2013

Citing 20
Cited 0

An architecture for a business and information system

IBM Systems Journal
ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

ACM Transactions on Database Systems (TODS)
DB2 parallel edition

IBM Systems Journal
Item-based collaborative filtering recommendation algorithms

Proceedings of the 10th international conference on World Wide Web
Building the Data Warehouse,3rd Edition

Building the Data Warehouse,3rd Edition
Amazon.com Recommendations: Item-to-Item Collaborative Filtering

IEEE Internet Computing
User's Guide to Websphere Extreme Scale

User's Guide to Websphere Extreme Scale
Benchmarking cloud serving systems with YCSB

Proceedings of the 1st ACM symposium on Cloud computing
The YouTube video recommendation system

Proceedings of the fourth ACM conference on Recommender systems
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
10 rules for scalable performance in 'simple operation' datastores

Communications of the ACM
The case for RAMCloud

Communications of the ACM
An overview of business intelligence technology

Communications of the ACM
High performance database logging using storage class memory

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Fast crash recovery in RAMCloud

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Memcached Design on High Performance RDMA Capable Interconnects

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
NoSQL databases: a step to database scalability in web environment

Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services
Web-scale user modeling for targeting

Proceedings of the 21st international conference companion on World Wide Web
bLSM: a general purpose log structured merge tree

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
RecStore: an extensible and adaptive framework for online recommender queries inside the database engine

Proceedings of the 15th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emerging transactional workloads from Internet and mobile commerce require low-latency, massive-scale, and integrated data analytics to enhance user experience and to improve up-selling opportunities. These analytics require new application platforms that must be able to absorb large volumes of data, provide low-latency access to the data, and cache data objects to improve access times in distributed environments. This paper reports on recent technologies built at IBM Research to address challenges in data access latency, data ingestion, and caching in the exemplary context of an online product recommendation application. We describe three technologies related to the issues and optimizations of key-value data object store and access. First, we describe the architecture of a global secondary index to greatly improve data access latency of Hadoop™ Database (HBase™), an open-source key-value distributed data store. Second, we present an in-memory write-ahead log feature on HBase that significantly improves write operations for high-volume data ingestion. Third, we detail an innovative distributed caching system that exploits low-latency interconnects to use hash maps of data keys on each server for local lookup, while data resides and are accessed across clustered systems. The distributed cache can achieve a 100-to 1,000-fold performance gain over many caching methods. These technologies together form some necessary building blocks for a next-generation data-centric middleware for integrated transaction and analytic workloads.