AN EFFICIENT REPRESENTATION MODEL OF DISTANCE DISTRIBUTION BETWEEN UNCERTAIN OBJECTS

  • Authors:
  • Edward Hung;Lurong Xiao;Regant Y.S. Hung

  • Affiliations:
  • Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong;Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong

  • Venue:
  • Computational Intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we consider the problem of efficient computation of distance between uncertain objects. In many real life applications, data like sensor readings and weather forecasts are usually uncertain when they are collected or produced. An uncertain object has a probability distribution function (PDF) to represent the probability that it is actually located in a particular location. A fast and accurate distance computation between uncertain objects is important to many uncertain query evaluation (e.g., range queries and nearest-neighbor queries) and uncertain data mining tasks (e.g., classifications, clustering, and outlier detection). However, existing approaches involve distance computations between samples of two objects, which is very computationally intensive. On one hand, it is expensive to calculate and store the actual distribution of the possible distance values between two uncertain objects. On the other hand, the expected distance (the weighted average of the pairwise distances among samples of two uncertain objects) provides very limited information and also restricts the definitions and usefulness of queries and mining tasks. In this paper, we propose several approaches to calculate the mean of the actual distance distribution and approximate its variance. Based on these, we suggest that the actual distance distribution could be approximated using a standard distribution like Gaussian or Gamma distribution. Experiments on real data and synthetic data show that our approach produces an approximation in a very short time with acceptable accuracy (about 90%). We suggest that it is practical for the research communities to define and develop more powerful queries and data mining tasks based on the distance distribution instead of the expected distance. © 2012 Wiley Periodicals, Inc.