Toward a scale-out data-management middleware for low-latency enterprise computing

  • Authors:
  • L. L. Fong;Y. Gao;X. R. Guerin;Y. G. Liu;T. Salo;S. R. Seelam;W. Tan;S. Tata

  • Affiliations:
  • IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Software Group, Application and Integration Middleware Software, Emerging Technology Institute, Durham, NC;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, NY;IBM Research Division, Almaden Research Center, San Jose, CA

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Emerging transactional workloads from Internet and mobile commerce require low-latency, massive-scale, and integrated data analytics to enhance user experience and to improve up-selling opportunities. These analytics require new application platforms that must be able to absorb large volumes of data, provide low-latency access to the data, and cache data objects to improve access times in distributed environments. This paper reports on recent technologies built at IBM Research to address challenges in data access latency, data ingestion, and caching in the exemplary context of an online product recommendation application. We describe three technologies related to the issues and optimizations of key-value data object store and access. First, we describe the architecture of a global secondary index to greatly improve data access latency of Hadoop™ Database (HBase™), an open-source key-value distributed data store. Second, we present an in-memory write-ahead log feature on HBase that significantly improves write operations for high-volume data ingestion. Third, we detail an innovative distributed caching system that exploits low-latency interconnects to use hash maps of data keys on each server for local lookup, while data resides and are accessed across clustered systems. The distributed cache can achieve a 100-to 1,000-fold performance gain over many caching methods. These technologies together form some necessary building blocks for a next-generation data-centric middleware for integrated transaction and analytic workloads.