Algorithms for finding and correcting four kinds of data mistakes in information table

Authors:
Feng Honghai;Xu Hao;Liu Baoyan;He LiYun;Yang Bingru;Li Yueli
Affiliations:
Hebei Agricultural University, Baoding, Hebei, China;China_Japan Friendship Hospital, Beijing, China;China Academy of Traditional Chinese Medicine, Beijing, China;China Academy of Traditional Chinese Medicine, Beijing, China;University of Science and Technology Beijing, Beijing, China;Hebei Agricultural University, Baoding, Hebei, China
Venue:
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Year:
2006

Citing 2
Cited 0

Discovering informative patterns and data cleaning

Advances in knowledge discovery and data mining
Data quality and data cleaning: an overview

Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a real world data set there are usually four kinds of mistaken values, the first one is the mistake in unit; the second one is the mistake of putting the radix points in wrong place, the third one is a scribal error, and the fourth one is a computational mistake. In this paper, we propose two algorithms for finding these four kinds of mistaken data. SARS and coronary heart disease data sets experimental results show that the two algorithms are available, that is, using the two algorithms we find some mistakes in the SARS and coronary heart disease data sets, and the results correspond to that found manually by medical experts.