A surrogate variable-based data mining method using CFS and RSM

  • Authors:
  • Le Yang;Sangmun Shin;Yongsun Choi;Myeonggil Choi;Younghee Lee

  • Affiliations:
  • Department of Systems Management & Engineering, Inje University, Gimhae, Gyungnam, South Korea;Department of Systems Management & Engineering, Inje University, Gimhae, Gyungnam, South Korea;Department of Systems Management & Engineering, Inje University, Gimhae, Gyungnam, South Korea;Department of Systems Management & Engineering, Inje University, Gimhae, Gyungnam, South Korea;Department of Industrial Management Engineering, Dong-A University, Handang-Dong, Saha, Busan, South Korea

  • Venue:
  • ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many scientific and engineering fields, there are a number of data sets uncontrollable and hard to handle because the nature of measurement of a performance variable may often be destructive or very expensive, which are known as sets of noise factors. Although these noise factors, which may not be controlled by manufacturing and cost reasons, are merged as a key problem of data mining (DM) and analysis, most DM methods may not discuss robustness of solutions either by considering noise factors or by incorporating specific statistical inferences. In order to address this problem, the primary objective of this paper is to propose a integrated approach, called surrogate variable-based data mining method (SVDM), which can conduct dimensionality reduction by exacting the significant factors from the row data sets by applying correlation-based feature selection (CFS). The proposed method then incorporates noise factor consideration to achieve robustness of an analysis by using the principle of surrogate variable. In addition, this proposed method is far more effective when a 100% inspection and a destructive characteristic/response are considered. Finally, response surface methodology (RSM), which is a statistical tool that is useful for modeling and analysis in situations where the response of interest is affected by several input factors, is used for further statistical analyses.