Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method

Authors:
Yan Chen;Feibai Zhu;Jay Lee
Affiliations:
NSF Center for Intelligent Maintenance Systems, University of Cincinnati, OH, United States;NSF Center for Intelligent Maintenance Systems, University of Cincinnati, OH, United States;NSF Center for Intelligent Maintenance Systems, University of Cincinnati, OH, United States
Venue:
Computers in Industry
Year:
2013

Citing 10
Cited 0

A fast algorithm for the minimum covariance determinant estimator

Technometrics
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Comparative Study of RNN for Outlier Detection in Data Mining

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Understanding Digital Signal Processing (2nd Edition)

Understanding Digital Signal Processing (2nd Edition)
The Effectiveness of Lloyd-Type Methods for the k-Means Problem

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Intelligent prognostics tools and e-maintenance

Computers in Industry - Special issue: E-maintenance
Building Projectable Classifiers of Arbitrary Complexity

ICPR '96 Proceedings of the 13th International Conference on Pattern Recognition - Volume 2
A discriminative framework for clustering via similarity functions

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

When developing Prognostic and Health Management (PHM) applications for manufacturing systems, data acquired frequently comes with issues which hinder further data analysis. However, there is neither a clear definition of the data quality nor evaluation methods to quantify if acquired data is suitable for these prognostic modeling tasks such as failures detection, diagnosis and prediction. Especially, during health diagnosis modeling of engineering systems, based on data-driven method, acquired data is expected to contain clusters that can be used to differentiate multiple system health conditions. So in most cases, once data is acquired, people would like to intuitively believe that data is able to cluster into subgroups. However, this bias could lead to acceptance of false information in data. Furthermore, most of the existing metrics, such as clustering tendency in statistics and cluster-ability in data mining, only individually evaluate data characteristics without considering prognostic modeling. This paper proposes a new method to evaluate and improve data quality for system health diagnosis modeling. The clusters, as critical data characteristics for modeling multiple system conditions, are first estimated by ''visualization'' on the dissimilarity spectrum from spectral analysis and then evaluated in terms of their fitness and separation with each others. A visual assessment based outlier detection method is also proposed to recognize outliers from the data, which utilizes the graphic intermediate results from previous evaluation. Finally one group of bearing testing dataset acquired from real industrial applications is used to demonstrate how proposed methods are used to evaluate and improve the data quality.