A Framework for Clustering Uncertain Data Streams

Authors:
Charu C. Aggarwal;Philip S. Yu
Affiliations:
IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532, USA. charu@us.ibm.com;IBM T. J. Watson Research Center, 19 Skyline Drive, Hawthorne, NY 10532, USA. psyu@us.ibm.com
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 27

Approximation algorithms for clustering uncertain data

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sliding-window top-k queries on uncertain streams

Proceedings of the VLDB Endowment
Incremental clustering of dynamic data streams using connectivity based representative points

Data & Knowledge Engineering
Efficiently Clustering Probabilistic Data Streams

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
A Sliding-Window Approach for Finding Top-k Frequent Itemsets from Uncertain Streams

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Mining uncertain data for constrained frequent sets

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Mining uncertain data for frequent itemsets that satisfy aggregate constraints

Proceedings of the 2010 ACM Symposium on Applied Computing
Sliding-window top-k queries on uncertain streams

The VLDB Journal — The International Journal on Very Large Data Bases
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards mobility-based clustering

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Metric spaces in data mining: applications to clustering

SIGSPATIAL Special
Selective data acquisition for probabilistic K-NN query

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Data selection for exact value acquisition to improve uncertain clustering

WAIM'10 Proceedings of the 11th international conference on Web-age information management
A discretization algorithm for uncertain data

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Outlier detection over sliding windows for probabilistic data streams

Journal of Computer Science and Technology
Handling ER-topk query on uncertain streams

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Continuous inverse ranking queries in uncertain streams

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
A practice probability frequent pattern mining method over transactional uncertain data streams

UIC'11 Proceedings of the 8th international conference on Ubiquitous intelligence and computing
Dealing with biometric multi-dimensionality through chaotic neural network methodology

International Journal of Information Technology and Management
UNN: a neural network for uncertain data classification

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
HUE-Stream: evolution-based clustering technique for heterogeneous data streams with uncertainty

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Mining uncertain data streams using clustering feature decision trees

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
Uncertain OLAP over multidimensional data streams: state-of-the-art analysis and research perspectives

FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology
Data stream clustering: A survey

ACM Computing Surveys (CSUR)
Probabilistic k-skyband operator over sliding windows

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Probabilistic skyline operator over sliding windows

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, uncertain data management applications have grown in importance because of the large number of hardware applications which measure data approximately. For example, sensors are typically expected to have considerable noise in their readings because of inaccuracies in data retrieval, transmission, and power failures. In many cases, the estimated error of the underlying data stream is available. This information is very useful for the mining process, since it can be used in order to improve the quality of the underlying results. In this paper we will propose a method for clustering uncertain data streams. We use a very general model of the uncertainty in which we assume that only a few statistical measures of the uncertainty are available. We will show that the use of even modest uncertainty information during the mining process is sufficient to greatly improve the quality of the underlying results. We show that our approach is more effective than a purely deterministic method such as the CluStream approach. We will test the approach on a variety of real and synthetic data sets and illustrate the advantages of the method in terms of effectiveness and efficiency.