From outliers to prototypes: Ordering data

Authors:
Stefan Harmeling;Guido Dornhege;David Tax;Frank Meinecke;Klaus-Robert Müller
Affiliations:
Fraunhofer FIRST.IDA, Kekuléstrasse 7, 12489 Berlin, Germany and Department of Computer Science, University of Potsdam, August-Bebel-Strasse 89, 14482 Potsdam, Germany;Fraunhofer FIRST.IDA, Kekuléstrasse 7, 12489 Berlin, Germany;Delft University of Technology, Information and Communication Theory Group, P.O. Box 5031, 2600 GA, Delft, The Netherlands;Fraunhofer FIRST.IDA, Kekuléstrasse 7, 12489 Berlin, Germany;Fraunhofer FIRST.IDA, Kekuléstrasse 7, 12489 Berlin, Germany and Department of Computer Science, University of Potsdam, August-Bebel-Strasse 89, 14482 Potsdam, Germany
Venue:
Neurocomputing
Year:
2006

Citing 12
Cited 12

Introduction to algorithms

Introduction to algorithms
A fast algorithm for the minimum covariance determinant estimator

Technometrics
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Concept learning in the absence of counterexamples: an autoassociation-based approach to classification

Concept learning in the absence of counterexamples: an autoassociation-based approach to classification
Uniform object generation for optimizing one-class classifiers

The Journal of Machine Learning Research
Estimating the Support of a High-Dimensional Distribution

Neural Computation
On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions

IEEE Transactions on Computers
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition
Neural-network classifiers for recognizing totally unconstrained handwritten numerals

IEEE Transactions on Neural Networks

Linear-Time Computation of Similarity Measures for Sequential Data

The Journal of Machine Learning Research
A Comparative Study of Unsupervised Machine Learning and Data Mining Techniques for Intrusion Detection

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
An evaluation of dimension reduction techniques for one-class classification

Artificial Intelligence Review
Minimum spanning tree based one-class classifier

Neurocomputing
A hybrid novelty score and its use in keystroke dynamics-based user authentication

Pattern Recognition
On the importance of data balancing for symbolic regression

IEEE Transactions on Evolutionary Computation
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Detection of different authorship of text sequences through self-organizing maps and mutual information function

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Detecting unknown network attacks using language models

DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment
Learning intrusion detection: supervised or unsupervised?

ICIAP'05 Proceedings of the 13th international conference on Image Analysis and Processing
Intrinsic Bayesian model for high-dimensional unsupervised reduction

Neurocomputing
Authorship attribution as a case of anomaly detection: A neural network model

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We propose simple and fast methods based on nearest neighbors that order objects from high-dimensional data sets from typical points to untypical points. On the one hand, we show that these easy-to-compute orderings allow us to detect outliers (i.e. very untypical points) with a performance comparable to or better than other often much more sophisticated methods. On the other hand, we show how to use these orderings to detect prototypes (very typical points) which facilitate exploratory data analysis algorithms such as noisy nonlinear dimensionality reduction and clustering. Comprehensive experiments demonstrate the validity of our approach.