Hancock: A language for analyzing transactional data streams

Authors:
Corinna Cortes;Kathleen Fisher;Daryl Pregibon;Anne Rogers;Frederick Smith
Affiliations:
AT&T Labs, New York, NY;AT&T Labs, NJ;AT&T Labs, New York, NY;AT&T Labs, Chicago, IL;AT&T Labs, Natick, MA
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2004

Citing 18
Cited 7

An Intrusion-Detection Model

IEEE Transactions on Software Engineering - Special issue on computer security and privacy
A runtime system

Lisp and Symbolic Computation
Systems programming with Modula-3

Systems programming with Modula-3
An orthogonally persistent Java

ACM SIGMOD Record
IP lookups using multiway and multicolumn search

IEEE/ACM Transactions on Networking (TON)
Information mining platforms: an infrastructure for KDD rapid deployment

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Hancock: a language for processing very large-scale data

Proceedings of the 2nd conference on Domain-specific languages
Hancock: a language for extracting signatures from data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Adding Persistence to the Oberon-System

JMLC '97 Proceedings of the Joint Modular Languages Conference on Modular Programming Languages
Virtual Data Warehousing, Data Publishing, and Call Detail

Proceedings of the International Workshop on Databases in Telecommunications
An Application-Specific Database

DBPL '01 Revised Papers from the 8th International Workshop on Database Programming Languages
Principles of Program Design

Principles of Program Design
Pickling state in the javaTM system

COOTS'96 Proceedings of the 2nd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 2
Tribeca: a system for managing large databases of network traffic

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

PADS: a domain-specific language for processing ad hoc data

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
New results for finding common neighborhoods in massive graphs in the data stream model

Theoretical Computer Science
We need more than one: why students need a sophisticated understanding of programming languages

ACM SIGPLAN Notices
Using data correlation to build an intrusion detection system

ICAI'09 Proceedings of the 10th WSEAS international conference on Automation & information
Symbiote: a Reconfigurable Logic Assisted Data Stream Management System (RLADSMS)

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A practice probability frequent pattern mining method over transactional uncertain data streams

UIC'11 Proceedings of the 8th international conference on Ubiquitous intelligence and computing
A catalog of stream processing optimizations

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Massive transaction streams present a number of opportunities for data mining techniques. The transactions in such streams might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover how the transactors, for example, credit-card numbers or IP addresses, use the associated services.Over the past 5 years, we have computed evolving profiles (called signatures) of transactors in several very large data streams. The signature for each transactor captures the salient features of his or her behavior through time. Programs for processing signatures must be highly optimized because of the size of the data stream (several gigabytes per day) and the number of signatures to maintain (hundreds of millions). Originally, we wrote such programs directly in C, but because these programs often sacrificed readability for performance, they were difficult to verify and maintain.Hancock is a domain-specific language we created to express computationally efficient signature programs cleanly. In this paper, we describe the obstacles to computing signatures from massive streams and explain how Hancock addresses these problems. For expository purposes, we present Hancock using a running example from the telecommunications industry; however, the language itself is general and applies equally well to other data sources.