IEEE Transactions on Software Engineering - Special issue on computer security and privacy
Lisp and Symbolic Computation
Systems programming with Modula-3
Systems programming with Modula-3
An orthogonally persistent Java
ACM SIGMOD Record
IP lookups using multiway and multicolumn search
IEEE/ACM Transactions on Networking (TON)
Information mining platforms: an infrastructure for KDD rapid deployment
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Hancock: a language for processing very large-scale data
Proceedings of the 2nd conference on Domain-specific languages
Hancock: a language for extracting signatures from data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Mining and Knowledge Discovery
Adding Persistence to the Oberon-System
JMLC '97 Proceedings of the Joint Modular Languages Conference on Modular Programming Languages
Virtual Data Warehousing, Data Publishing, and Call Detail
Proceedings of the International Workshop on Databases in Telecommunications
An Application-Specific Database
DBPL '01 Revised Papers from the 8th International Workshop on Database Programming Languages
Principles of Program Design
Pickling state in the javaTM system
COOTS'96 Proceedings of the 2nd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 2
Tribeca: a system for managing large databases of network traffic
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Streaming queries over streaming data
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
PADS: a domain-specific language for processing ad hoc data
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
New results for finding common neighborhoods in massive graphs in the data stream model
Theoretical Computer Science
Using data correlation to build an intrusion detection system
ICAI'09 Proceedings of the 10th WSEAS international conference on Automation & information
Symbiote: a Reconfigurable Logic Assisted Data Stream Management System (RLADSMS)
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A practice probability frequent pattern mining method over transactional uncertain data streams
UIC'11 Proceedings of the 8th international conference on Ubiquitous intelligence and computing
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Massive transaction streams present a number of opportunities for data mining techniques. The transactions in such streams might represent calls on a telephone network, commercial credit card purchases, stock market trades, or HTTP requests to a web server. While historically such data have been collected for billing or security purposes, they are now being used to discover how the transactors, for example, credit-card numbers or IP addresses, use the associated services.Over the past 5 years, we have computed evolving profiles (called signatures) of transactors in several very large data streams. The signature for each transactor captures the salient features of his or her behavior through time. Programs for processing signatures must be highly optimized because of the size of the data stream (several gigabytes per day) and the number of signatures to maintain (hundreds of millions). Originally, we wrote such programs directly in C, but because these programs often sacrificed readability for performance, they were difficult to verify and maintain.Hancock is a domain-specific language we created to express computationally efficient signature programs cleanly. In this paper, we describe the obstacles to computing signatures from massive streams and explain how Hancock addresses these problems. For expository purposes, we present Hancock using a running example from the telecommunications industry; however, the language itself is general and applies equally well to other data sources.