Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Anomaly Detection over Noisy Data using Learned Probability Distributions
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Finding surprising patterns in a time series database in linear time and space
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Rule-based anomaly pattern detection for detecting disease outbreaks
Eighteenth national conference on Artificial intelligence
Mining Motifs in Massive Time Series Databases
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable and practical probability density estimators for scientific anomaly detection
Scalable and practical probability density estimators for scientific anomaly detection
Unsupervised anomaly detection in network intrusion detection using clusters
ACSC '05 Proceedings of the Twenty-eighth Australasian conference on Computer Science - Volume 38
An association-based dissimilarity measure for categorical data
Pattern Recognition Letters
A study in using neural networks for anomaly and misuse detection
SSYM'99 Proceedings of the 8th conference on USENIX Security Symposium - Volume 8
Data mining approaches for intrusion detection
SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Intrusion detection using sequences of system calls
Journal of Computer Security
IEEE Transactions on Software Engineering
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
A statistically based system for prioritizing information exploration under uncertainty
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Anomaly pattern detection in categorical datasets
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
A Fast Feature-Based Method to Detect Unusual Patterns in Multidimensional Datasets
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Detecting outliers in categorical record databases based on attribute associations
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Outlier detection in transactional data
Intelligent Data Analysis
Spatiotemporal Models for Data-Anomaly Detection in Dynamic Environmental Monitoring Campaigns
ACM Transactions on Sensor Networks (TOSN)
Anomaly detection in categorical datasets using bayesian networks
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
Spatial categorical outlier detection: pair correlation function based approach
Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
A-GHSOM: An adaptive growing hierarchical self organizing map for network anomaly detection
Journal of Parallel and Distributed Computing
Fast and reliable anomaly detection in categorical data
Proceedings of the 21st ACM international conference on Information and knowledge management
Mining multidimensional contextual outliers from categorical relational data
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Fast generalized subset scan for anomalous pattern detection
The Journal of Machine Learning Research
A ranking-based algorithm for detection of outliers in categorical data
International Journal of Hybrid Intelligent Systems
Hi-index | 0.00 |
We consider the problem of detecting anomalies in high aritycategorical datasets. In most applications, anomalies are defined as datapoints that are "abnormal". Quite often we have access to data which consists mostly of normal records, a long with a small percentage of unlabelled anomalous records. We are interested in the problem of unsupervised anomaly detection, where we use the unlabelled data for training, and detect records that do not follow the definition of normality. A standard approach is to create a model of normal data, and compare test records against it. A probabilistic approach builds a likelihood model from the training data. Records are tested for anomalies based on the complete record likelihood given the probability model. For categorical attributes, bayes nets give a standard representation of the likelihood. While this approach is good at finding outliers in the dataset, it often tends to detect records with attribute values that are rare. Sometimes, just detecting rare values of an attribute is not desired and such outliers are not considered as anomalies in that context. We present an alternative definition of anomalies, and propose an approach of comparing against marginal distribution of attribute subsets. We show that this is a more meaningful way of detecting anomalies, and has a better performance over semi-synthetic as well as real world datasets.