Camouflaged fraud detection in domains with complex relationships
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online outlier detection in sensor data using non-parametric models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data streams: algorithms and applications
Foundations and Trends® in Theoretical Computer Science
Outlier detection in sensor networks
Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
Detecting outlier samples in multivariate time series dataset
Knowledge-Based Systems
Mining data streams with periodically changing distributions
Proceedings of the 18th ACM conference on Information and knowledge management
Detecting outliers on arbitrary data streams using anytime approaches
Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques
Mining outliers in spatial networks
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Ranking outliers using symmetric neighborhood relationship
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
AnyOut: anytime outlier detection on streaming data
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Hi-index | 0.00 |
One of the central tasks in managing, monitoring andmining data streams is that of identifying outliers. There isa long history of study of various outliers in statistics anddatabases, and a recent focus on mining outliers in datastreams. Here, we adopt the notion of "deviants" from Jagadishet al [Mining Deviants in a Time Series Database] as outliers. Deviants are based on one ofthe most fundamental statistical concept of standard deviation(or variance). Formally, deviants are defined basedon a representation sparsity metric, i.e., deviants are valueswhose removal from the dataset leads to an improvedcompressed representation of the remaining items. Thus, deviantsare not global maxima/minima, but rather these areappropriate local aberrations. Deviants are known to be ofgreat mining value in time series databases.We present first-known algorithms for identifying deviantson massive data streams. Our algorithms monitorstreams using very small space (polylogarithmic in datasize) and are able to quickly find deviants at any instant,as the data stream evolves over time. For all versions of thisproblem-uni- vs multivariate time series, optimal vs near-optimalvs heuristic solutions, offline vs streaming-our algorithmshave the same framework of maintaining a hierarchicalset of candidate deviants that are updated as the timeseries data gets progressively revealed. We show experimentallyusing real network traffic data (SNMP aggregate timeseries) as well as synthetic data that our algorithm is remarkablyaccurate in determining the deviants.