Data mining on an OLTP system (nearly) for free

Authors:
Erik Riedel;Christos Faloutsos;Gregory R. Ganger;David F. Nagle
Affiliations:
Hewlett-Packard Laboratories, Palo Alto, California;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Venue:
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Year:
2000

Citing 12
Cited 17

An introduction to disk drive modeling

Computer
Scheduling algorithms for modern disk drives

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Research problems in data warehousing

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
On-line extraction of SCSI disk drive parameters

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A case for intelligent disks (IDISKs)

ACM SIGMOD Record
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Active Storage for Large-Scale Data Mining and Multimedia

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

Active Disks for Large-Scale Data Processing

Computer
Distributed Computing with Load-Managed Active Storage

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
DRPM: dynamic speed control for power management in server class disks

Proceedings of the 30th annual international symposium on Computer architecture
Memory-adative association rules mining

Information Systems - Databases: Creation, management and utilization
Adaptive, unsupervised stream mining

The VLDB Journal — The International Journal on Very Large Data Bases
Systems Support for Preemptive Disk Scheduling

IEEE Transactions on Computers
Design and Implementation of Semi-preemptible IO

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
Intelligent storage: Cross-layer optimization for soft real-time workload

ACM Transactions on Storage (TOS)
The leganet system: Freshness-aware transaction routing in a database cluster

Information Systems
Towards higher disk head utilization: extracting free bandwidth from busy disk drives

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Operating system management of MEMS-based storage devices

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Adaptive, hands-off stream mining

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Lachesis: robust database storage management based on device-specific performance characteristics

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
NCQ vs. I/O scheduler: Preventing unexpected misbehaviors

ACM Transactions on Storage (TOS)
Freeblock scheduling outside of disk firmware

FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
Design and implementation of semi-preemptible IO

FAST'03 Proceedings of the 2nd USENIX conference on File and storage technologies
Survey and analysis of disk scheduling methods

ACM SIGARCH Computer Architecture News

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to consistently provide one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head “passes over” them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.