Mining Deviants in Time Series Data Streams

Authors:
S. Muthukrishnan;Rahul Shah;Jeffrey Scott Vitter
Affiliations:
Rutgers University;Purdue University;Purdue University
Venue:
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Year:
2004

Citing 0
Cited 10

Camouflaged fraud detection in domains with complex relationships

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
Detecting outlier samples in multivariate time series dataset

Knowledge-Based Systems
Mining data streams with periodically changing distributions

Proceedings of the 18th ACM conference on Information and knowledge management
Detecting outliers on arbitrary data streams using anytime approaches

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
Mining outliers in spatial networks

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Ranking outliers using symmetric neighborhood relationship

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
AnyOut: anytime outlier detection on streaming data

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the central tasks in managing, monitoring andmining data streams is that of identifying outliers. There isa long history of study of various outliers in statistics anddatabases, and a recent focus on mining outliers in datastreams. Here, we adopt the notion of "deviants" from Jagadishet al [Mining Deviants in a Time Series Database] as outliers. Deviants are based on one ofthe most fundamental statistical concept of standard deviation(or variance). Formally, deviants are defined basedon a representation sparsity metric, i.e., deviants are valueswhose removal from the dataset leads to an improvedcompressed representation of the remaining items. Thus, deviantsare not global maxima/minima, but rather these areappropriate local aberrations. Deviants are known to be ofgreat mining value in time series databases.We present first-known algorithms for identifying deviantson massive data streams. Our algorithms monitorstreams using very small space (polylogarithmic in datasize) and are able to quickly find deviants at any instant,as the data stream evolves over time. For all versions of thisproblem-uni- vs multivariate time series, optimal vs near-optimalvs heuristic solutions, offline vs streaming-our algorithmshave the same framework of maintaining a hierarchicalset of candidate deviants that are updated as the timeseries data gets progressively revealed. We show experimentallyusing real network traffic data (SNMP aggregate timeseries) as well as synthetic data that our algorithm is remarkablyaccurate in determining the deviants.