TI-DBSCAN: clustering with DBSCAN by means of the triangle inequality

Authors:
Marzena Kryszkiewicz;Piotr Lasek
Affiliations:
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland;Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Venue:
RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Year:
2010

Citing 5
Cited 7

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence

A neighborhood-based clustering by means of the triangle inequality

IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
High scent web page recommendations using fuzzy rough set attribute reduction

Transactions on rough sets XIV
Tolerance rough set theory based data summarization for clustering large datasets

Transactions on rough sets XIV
The impact of triangular inequality violations on medoid-based clustering

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Efficient determination of binary non-negative vector neighbors with regard to cosine similarity

IEA/AIE'12 Proceedings of the 25th international conference on Industrial Engineering and Other Applications of Applied Intelligent Systems: advanced research in applied artificial intelligence
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using Non-Zero Dimensions for the Cosine and Tanimoto Similarity Search Among Real Valued Vectors

Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grouping data into meaningful clusters is an important data mining task. DBSCAN is recognized as a high quality density-based algorithm for clustering data. It enables both the determination of clusters of any shape and the identification of noise in data. The most time-consuming operation in DBSCAN is the calculation of a neighborhood for each data point. In order to speed up this operation in DBSCAN, the neighborhood calculation is expected to be supported by spatial access methods. DBSCAN, nevertheless, is not efficient in the case of high dimensional data. In this paper, we propose a new efficient TI-DBSCAN algorithm and its variant TI-DBSCAN-REF that apply the same clustering methodology as DBSCAN. Unlike DBSCAN, TI-DBSCAN and TI-DBSCAN-REF do not use spatial indices; instead they use the triangle inequality property to quickly reduce the neighborhood search space. The experimental results prove that the new algorithms are up to three orders of magnitude faster than DBSCAN, and efficiently cluster both low and high dimensional data.