Maintaining Implicated Statistics in Constrained Environments

Authors:
Yannis Sismanis;Nick Roussopoulos
Affiliations:
I.B.M. Almaden Research Center;University of Maryland
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 25
Cited 1

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
Approximate inference of functional dependencies from relations

ICDT '92 Selected papers of the fourth international conference on Database theory
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites

Proceedings of the 11th international conference on World Wide Web
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Data mining-based intrusion detectors: an overview of the columbia IDS project

ACM SIGMOD Record
Temporal and spatio-temporal aggregations over data streams using multiple time granularities

Information Systems - Special issue: Best papers from EDBT 2002
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Learning Probabilistic Relational Models

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Automatically inferring patterns of resource consumption in network traffic

Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications
Information sharing across private databases

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Finding hierarchical heavy hitters in data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Toward automated large-scale information integration and discovery

Data Management in a Connected World

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aggregated information regarding implicated entities is critical for online applications like network management, traffic characterization or identifying patters of resource consumption. Recently there has been a flurry of research for online aggregation on streams (like quantiles, hot items, hierarchical heavy hitters) but surprizingly the problem of summarizing implicated information in stream data has received no attention. As an example, consider an IP-network and the implication source 驴 destination. Flash crowds, 驴 such as those that follow recent sport events (like the olympics) or seek information regarding catastrophic events 驴 or denial of service attacks direct a large volume of traffic from a huge number of sources to a very small number of destinations. In this paper we present novel randomized algorithms for monitoring such implications with constraints in both memory and processing power for environments like network routers. Our experiments demonstrate several factors of improvements over straightforward approaches.