Minority report in fraud detection: classification of skewed data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Early detection of insider trading in option markets
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining anomalies using traffic feature distributions
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Detecting anomalous records in categorical datasets
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Weighting versus pruning in rule validation for detecting network and host anomalies
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
ACM Computing Surveys (CSUR)
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
In this paper we present a method for finding anomalous records in categorical or mixed datasets in an unsupervised fashion. Since the data in many problems consist of normal records with a small minority of anomalies, many approaches build a model from the training data and compare the test records against it. But instead of building a model, we keep track of the number of occurrences of different attribute value combinations. We also consider a more meaningful definition of anomalies and incorporate the Bayesian network structure in it. A scoring technique is defined for each test record. In this procedure we combine supports of different rules according to the Bayesian network structure in order to determine the label of the test instances. As it is shown in the results, our proposed method has a higher or similar f-measure and precision compared to a Bayesian network based approach in all cases.