Rough set-based SAR analysis: An inductive method

  • Authors:
  • Ying Dong;Bingren Xiang;Teng Wang;Hao Liu;Lingbo Qu

  • Affiliations:
  • Department of Organic Chemistry, China Pharmaceutical University, 24 Tongjia Xiang, Nanjing, 210009, PR China and Center for Instrumental Analysis, China Pharmaceutical University, 24 Tongjia Xian ...;Center for Instrumental Analysis, China Pharmaceutical University, 24 Tongjia Xiang, Nanjing 210009, PR China;Center for Instrumental Analysis, China Pharmaceutical University, 24 Tongjia Xiang, Nanjing 210009, PR China;Department of Pharmacy, Guangdong Vocational College of Chemical Engineering and Pharmaceutics, 321 North Longdong Road, Guangzhou 510520, PR China;Department of Chemistry, Zhengzhou University, 100 Science Road, Zhengzhou 450052, PR China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 12.05

Visualization

Abstract

Rough set algorithm was used as a new methodology to build structure-activity relationship (SAR) models in this paper. It acted as feature selector and nonlinear rule generator. The SAR model expressed as human readable if-then rules was developed for the inhibition of the serine/threonine kinase CDK1/cyclinB by compounds from the indirubin inhibitor family. The feature selection ability of rough set algorithm was compared with the build-in approaches (CfsSubsetEval and ConsistencySubsetEval) in Weka under leave-one-out (LOO) and 10-fold cross-validation. Through training a set of 31 objects, a rule-based SAR model had been built with a reduct generated by rough set. The predictability of the model was evaluated by an external test set of 16 compounds. The existing powerful approaches, such as the decision tree learners, neural network, support vector classifier and LogitBoost approaches, were used to verify the performance of rough set method. It revealed that rough set method should play important role in data preprocessing and model building of nonlinear SAR analysis. The advantages and limitations of rough set-based SAR analysis were discussed. The results were satisfactorily in accordance with the available understanding of cocrystal structures and 3D QSAR models.