Genetic algorithms + data structures = evolution programs (3rd ed.)
Genetic algorithms + data structures = evolution programs (3rd ed.)
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery with Evolutionary Algorithms
Data Mining and Knowledge Discovery with Evolutionary Algorithms
OPTICS-OF: Identifying Local Outliers
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Discovering cluster-based local outliers
Pattern Recognition Letters
Range Selectivity Estimation for Continuous Attributes
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
An analysis of the behavior of a class of genetic adaptive systems.
An analysis of the behavior of a class of genetic adaptive systems.
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Outlier analysis for gene expression data
Journal of Computer Science and Technology - Special issue on bioinformatics
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Editorial: special issue on learning from imbalanced data sets
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Finding key attribute subset in dataset for outlier detection
Knowledge-Based Systems
Robust data clustering by learning multi-metric Lq-norm distances
Expert Systems with Applications: An International Journal
A ranking-based algorithm for detection of outliers in categorical data
International Journal of Hybrid Intelligent Systems
Hi-index | 0.01 |
In this paper we present a genetic solution to the outlier detection problem. The essential idea behind this technique is to define outliers by examining those projections of the data, along which the data points have abnormal or inconsistent behavior (defined in terms of their sparsity values). We use a partitioning method to divide the data set into groups such that all the objects in a group can be considered to behave similarly. We then identify those groups that contain outliers. The algorithm assigns an 'outlier-ness' value that gives a relative measure of how strong an outlier group is. An evolutionary search computation technique is employed for determining those projections of the data over which the outliers can be identified. A new data structure, called the grid count tree (GCT), is used for efficient computation of the sparsity factor. GCT helps in quickly determining the number of points within any grid defined over the projected space and hence facilitates faster computation of the sparsity factor. A new crossover is also defined for this purpose. The proposed method is applicable for both numeric and categorical attributes. The search complexity of the GCT traversal algorithm is provided. Results are demonstrated for both artificial and real life data sets including four gene expression data sets.