A comparison of outlier detection algorithms for ITS data

Authors:
Shuyan Chen;Wei Wang;Henk van Zuylen
Affiliations:
Transportation College, Southeast University, 210096 Nanjing, China and Civil Engineering and Geosciences, Delft University of Technology, 2600 GA Delft, The Netherlands;Transportation College, Southeast University, 210096 Nanjing, China;Civil Engineering and Geosciences, Delft University of Technology, 2600 GA Delft, The Netherlands
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 6
Cited 2

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
RDF: A Density-Based Outlier Detection Method using Vertical Data Representation

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Outlier Mining in Large High-Dimensional Data Sets

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Finding key attribute subset in dataset for outlier detection

Knowledge-Based Systems
Development and application of tender evaluation decision-making and risk early warning system for water projects based on KDD

Advances in Engineering Software

Quantified Score

Hi-index	12.05

Visualization

Abstract

In order to improve the veracity and reliability of a traffic model built, or to extract important and valuable information from collected traffic data, the technique of outlier mining has been introduced into the traffic engineering domain for detecting and analyzing the outliers in traffic data sets. Three typical outlier algorithms, respectively the statistics-based approach, the distance-based approach, and the density-based local outlier approach, are described with respect to the principle, the characteristics and the time complexity of the algorithms. A comparison among the three algorithms is made through application to intelligent transportation systems (ITS). Two traffic data sets with different dimensions have been used in our experiments carried out, one is travel time data, and the other is traffic flow data. We conducted a number of experiments to recognize outliers hidden in the data sets before building the travel time prediction model and the traffic flow foundation diagram. In addition, some artificial generated outliers are introduced into the traffic flow data to see how well the different algorithms detect them. Three strategies-based on ensemble learning, partition and average LOF have been proposed to develop a better outlier recognizer. The experimental results reveal that these methods of outlier mining are feasible and valid to detect outliers in traffic data sets, and have a good potential for use in the domain of traffic engineering. The comparison and analysis presented in this paper are expected to provide some insights to practitioners who plan to use outlier mining for ITS data.