Computational geometry: an introduction
Computational geometry: an introduction
Multiattribute hashing using Gray codes
SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Fractals for secondary key retrieval
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Approximate nearest neighbor queries revisited
SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Discovery of fraud rules for telecommunications—challenges and solutions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Data mining: concepts and techniques
Data mining: concepts and techniques
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining top-n local outliers in large databases
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve
IEEE Transactions on Knowledge and Data Engineering
Findout: finding outliers in very large datasets
Knowledge and Information Systems
High Dimensional Similarity Search With Space Filling Curves
Proceedings of the 17th International Conference on Data Engineering
Fast Outlier Detection in High Dimensional Spaces
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Distance-based outliers: algorithms and applications
The VLDB Journal — The International Journal on Very Large Data Bases
Outlier detection and localisation with wavelet based multifractal formalism
Outlier detection and localisation with wavelet based multifractal formalism
Identifying and eliminating mislabeled training instances
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Distance-Based Detection and Prediction of Outliers
IEEE Transactions on Knowledge and Data Engineering
Detecting outliers using transduction and statistical testing
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Projective clustering using itemset discovery for multi-dimensional data analysis
MS'06 Proceedings of the 17th IASTED international conference on Modelling and simulation
Detecting outliers in interval data
Proceedings of the 44th annual Southeast regional conference
Topological approaches to covering rough sets
Information Sciences: an International Journal
Outlier detection by logic programming
ACM Transactions on Computational Logic (TOCL)
Very efficient mining of distance-based outliers
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Detecting distance-based outliers in streams of data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Outlier detection using default reasoning
Artificial Intelligence
Detecting outlier samples in multivariate time series dataset
Knowledge-Based Systems
DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets
ACM Transactions on Knowledge Discovery from Data (TKDD)
Detecting outlying properties of exceptional objects
ACM Transactions on Database Systems (TODS)
Approximate minimum spanning tree clustering in high-dimensional space
Intelligent Data Analysis
A comprehensive survey of numeric and symbolic outlier mining techniques
Intelligent Data Analysis
Efficient Pruning Schemes for Distance-Based Outlier Detection
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Mining Violations to Relax Relational Database Constraints
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
A comparison of outlier detection algorithms for ITS data
Expert Systems with Applications: An International Journal
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Data Mining and Knowledge Discovery
Distance-based outlier queries in data streams: the novel task and algorithms
Data Mining and Knowledge Discovery
Reduction about approximation spaces of covering generalized rough sets
International Journal of Approximate Reasoning
ODDC: outlier detection using distance distribution clustering
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Expert Systems with Applications: An International Journal
Fuzzy clustering-based approach for outlier detection
ACE'10 Proceedings of the 9th WSEAS international conference on Applications of computer engineering
New outlier detection method based on fuzzy clustering
WSEAS Transactions on Information Science and Applications
A fast algorithm for robust mixtures in the presence of measurement errors
IEEE Transactions on Neural Networks
A distributed approach to detect outliers in very large data sets
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
On detecting clustered anomalies using SCiForest
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part II
Finding key attribute subset in dataset for outlier detection
Knowledge-Based Systems
Journal of Intelligent Information Systems
An unbiased distance-based outlier detection approach for high-dimensional data
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Binary relation based rough sets
FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Multi knowledge based rough approximations and applications
Knowledge-Based Systems
Disclosing the element distribution of bloom filter
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part I
An application of rough sets to graph theory
Information Sciences: an International Journal
A minimum spanning tree-inspired clustering-based outlier detection technique
ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Enhancing minimum spanning tree-based clustering by removing density-based outliers
Digital Signal Processing
Exploiting domain knowledge to detect outliers
Data Mining and Knowledge Discovery
A multivariate fuzzy system applied for outliers detection
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Hi-index | 0.00 |
In this paper, a new definition of distance-based outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and high-dimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearest-neighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of space-filling curve to linearize the data set, and it consists of two phases. The first phase provides an approximate solution, within a rough factor, after the execution of at most d + 1 sorts and scans of the data set, with temporal cost quadratic in d and linear in N and in k, where d is the number of dimensions of the data set and N is the number of points in the data set. During this phase, the algorithm isolates points candidate to be outliers and reduces this set at each iteration. If the size of this set becomes n, then the algorithm stops reporting the exact solution. The second phase calculates the exact solution with a final scan examining further the candidate outliers that remained after the first phase. Experimental results show that the algorithm always stops, reporting the exact solution, during the first phase after much less than d + 1 steps. We present both an in-memory and disk-based implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases.