Towards Exploring Interactive Relationship between Clusters and Outliers in Multi-Dimensional Data Analysis

Authors:
Yong Shi;Aidong Zhang
Affiliations:
State University of New York at Buffalo;State University of New York at Buffalo
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 3
Cited 3

On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions

IEEE Transactions on Knowledge and Data Engineering

SubCOID: an attempt to explore cluster-outlier iterative detection approach to multi-dimensional data analysis in subspace

Proceedings of the 46th Annual Southeast Regional Conference on XX
Enhancing effectiveness of density-based outlier mining scheme with density-similarity-neighbor-based outlier factor

Expert Systems with Applications: An International Journal
Towards improving subspace data analysis

Proceedings of the 48th Annual Southeast Regional Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays many data mining algorithms focus on clustering methods. There are also a lot of approaches designed for outlier detection. We observe that, in many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Thus, it is necessary to treat clusters and outliers as concepts of the same importance in data analysis. In this paper, we present a cluster-outlier iterative detection algorithm, tending to detect the clusters and outliers in another perspective for noisy data sets. In this algorithm, clusters are detected and adjusted according to the intra-relationship within clusters and the inter-relationship between clusters and outliers, and vice versa. The adjustment and modification of the clusters and outliers are performed iteratively until a certain termination condition is reached. This data processing algorithm can be applied in many fields such as pattern recognition, data clustering and signal processing. Experimental results demonstrate the advantages of our approach.