EMU: An expectation maximization based approach for clustering uncertain data

  • Authors:
  • Biao Qin;Yuni Xia;Fang Li;Jiaqi Ge

  • Affiliations:
  • Department of Computer Science, Renmin University of China, Beijing, China;Department of Computer & Information Science, Indiana University Purdue University Indianapolis, Indianapolis, IN, US;Department of Mathematic Science, Indian University Purdue University Indianapolis, Indianapolis, IN, US;Department of Computer & Information Science, Indiana University Purdue University Indianapolis, Indianapolis, IN, US

  • Venue:
  • Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Real world applications as sensor networks and RFID networks usually generate data with uncertainty. Data uncertainty comes from many sources, as measurement errors, limited precision, data aggregation and so on. Classical data mining applications need to be modified and extended for uncertain data; otherwise, their performances might be dramatically downgraded by data uncertainty. In this paper, we define an uncertain data model for both numerical and categorical uncertain data, and propose a new Expectation-Maximization based algorithm EMU for clustering uncertain data. This approach is well designed to find the distribution parameters that maximize model qualities based on uncertain data, therefore correctly identify the clusters. Our clustering algorithm can process both numeric and categorical uncertain data. In our experiments, we use both synthetic and real data sets to evaluate the effectiveness and robustness of the proposed algorithm.