A probabilistic relational model and algebra
ACM Transactions on Database Systems (TODS)
Towards general measures of comparison of objects
Fuzzy Sets and Systems - Special issue dedicated to the memory of Professor Arnold Kaufmann
ProbView: a flexible probabilistic database system
ACM Transactions on Database Systems (TODS)
Probabilistic temporal databases, I: algebra
ACM Transactions on Database Systems (TODS)
ACM Transactions on Database Systems (TODS)
The Management of Probabilistic Data
IEEE Transactions on Knowledge and Data Engineering
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The Theory of Probabilistic Databases
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Clustering of interval data based on city-block distances
Pattern Recognition Letters
Aggregate operators in probabilistic databases
Journal of the ACM (JACM)
Indexing multi-dimensional uncertain data with arbitrary probability density functions
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Clustering of Uncertain Data
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
ACM Transactions on Computational Logic (TOCL)
ProTDB: probabilistic data in XML
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Mining frequent itemsets from uncertain data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Probabilistic similarity join on uncertain data
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Uncertain data mining: an example in clustering location data
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Divergence measures based on the Shannon entropy
IEEE Transactions on Information Theory
Hi-index | 0.00 |
In this paper, we consider the problem of efficient computation of distance between uncertain objects. In many real life applications, data like sensor readings and weather forecasts are usually uncertain when they are collected or produced. An uncertain object has a probability distribution function (PDF) to represent the probability that it is actually located in a particular location. A fast and accurate distance computation between uncertain objects is important to many uncertain query evaluation (e.g., range queries and nearest-neighbor queries) and uncertain data mining tasks (e.g., classifications, clustering, and outlier detection). However, existing approaches involve distance computations between samples of two objects, which is very computationally intensive. On one hand, it is expensive to calculate and store the actual distribution of the possible distance values between two uncertain objects. On the other hand, the expected distance (the weighted average of the pairwise distances among samples of two uncertain objects) provides very limited information and also restricts the definitions and usefulness of queries and mining tasks. In this paper, we propose several approaches to calculate the mean of the actual distance distribution and approximate its variance. Based on these, we suggest that the actual distance distribution could be approximated using a standard distribution like Gaussian or Gamma distribution. Experiments on real data and synthetic data show that our approach produces an approximation in a very short time with acceptable accuracy (about 90%). We suggest that it is practical for the research communities to define and develop more powerful queries and data mining tasks based on the distance distribution instead of the expected distance. © 2012 Wiley Periodicals, Inc.