SCG '94 Proceedings of the tenth annual symposium on Computational geometry
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An O(log*n) approximation algorithm for the asymmetric p-center problem
Journal of Algorithms
Algorithms for facility location problems with outliers
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Approximate clustering via core-sets
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
A Nearly Linear-Time Approximation Scheme for the Euclidean kappa-median Problem
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Local Search Heuristics for k-Median and Facility Location Problems
SIAM Journal on Computing
On coresets for k-means and k-median clustering
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Algorithms for dynamic geometric problems over data streams
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A Simple Linear Time (1+ ") -Approximation Algorithm for k-Means Clustering in Any Dimensions
FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
ULDBs: databases with uncertainty and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient Clustering of Uncertain Data
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A PTAS for k-means clustering based on weak coresets
SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Sketching probabilistic data streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Estimating statistical aggregates on probabilistic data streams
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient query evaluation on probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
A Framework for Clustering Uncertain Data Streams
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Geometric Approximation Algorithms
Geometric Approximation Algorithms
Uncertain data mining: an example in clustering location data
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
DTU: A Decision Tree for Uncertain Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Consensus answers for queries over probabilistic databases
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Exceeding expectations and clustering uncertain data
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Large-scale uncertainty management systems: learning and exploiting your data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Using uncertain chemical and thermal data to predict product quality in a casting process
Proceedings of the 1st ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data
Frequent subgraph pattern mining on uncertain graph data
Proceedings of the 18th ACM conference on Information and knowledge management
Associative classifier for uncertain data
WAIM'10 Proceedings of the 11th international conference on Web-age information management
A discretization algorithm for uncertain data
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
k-nearest neighbors in uncertain graphs
Proceedings of the VLDB Endowment
Geometric computations on indecisive points
WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Feature selection with mutual information for uncertain data
DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Efficiently answering probability threshold-based shortest path queries over uncertain graphs
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
UNN: a neural network for uncertain data classification
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Uncertain centroid based partitional clustering of uncertain data
Proceedings of the VLDB Endowment
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Consistent identification of Wiener systems: A machine learning viewpoint
Automatica (Journal of IFAC)
EMU: An expectation maximization based approach for clustering uncertain data
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
Hi-index | 0.00 |
There is an increasing quantity of data with uncertainty arising from applications such as sensor network measurements, record linkage, and as output of mining algorithms. This uncertainty is typically formalized as probability density functions over tuple values. Beyond storing and processing such data in a DBMS, it is necessary to perform other data analysis tasks such as data mining. We study the core mining problem of clustering on uncertain data, and define appropriate natural generalizations of standard clustering optimization criteria. Two variations arise, depending on whether a point is automatically associated with its optimal center, or whether it must be assigned to a fixed cluster no matter where it is actually located. For uncertain versions of k-means and k-median, we show reductions to their corresponding weighted versions on data with no uncertainties. These are simple in the unassigned case, but require some care for the assigned version. Our most interesting results are for uncertain k-center, which generalizes both traditional k-center and k-median objectives. We show a variety of bicriteria approximation algorithms. One picks O(kε--1log2n) centers and achieves a (1 + ε) approximation to the best uncertain k-centers. Another picks 2k centers and achieves a constant factor approximation. Collectively, these results are the first known guaranteed approximation algorithms for the problems of clustering uncertain data.