Scalable 2-Pass Data Mining Technique for Large Scale Spatio-temporal Datasets

Authors:
Tahar Kechadi;Michela Bertolotto
Affiliations:
University College Dublin, Belfield, Dublin 4, Ireland.;Sergio Di Martino, Filomena Ferrucci, Dip. di Matematica e Informatica, Università degli Studi di Salerno, Email: sdimartino,fferrucci@unisa.it, Italy
Venue:
KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
Year:
2007

Citing 7
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
C2P: Clustering based on Closest Pairs

Proceedings of the 27th International Conference on Very Large Data Bases
Discovering Similar Multidimensional Trajectories

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Visual Data Mining in Large Geospatial Point Sets

IEEE Computer Graphics and Applications
Exploratory spatio-temporal data mining and visualization

Journal of Visual Languages and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a system for mining very large spatio-temporal datasets. The system comprises two main layers: the mining layer and the visualization layer. The mining layer implements a new approach based on a 2-pass strategy to efficiently support the data-mining process, address the spatial and temporal dimensions of the dataset, and visualize and interpret results. In the first pass, the data objects are grouped according to their close similarity. In the second pass these groups are clustered to produce new models or patterns. The main reason for this 2-pass strategy is that the datasets are too large for traditional mining and cannot support the interactivity required by the visualization layer.