Improved modeling of clinical data with kernel methods

  • Authors:
  • Anneleen Daemen;Dirk Timmerman;Thierry Van den Bosch;Cecilia Bottomley;Emma Kirk;Caroline Van Holsbeke;Lil Valentin;Tom Bourne;Bart De Moor

  • Affiliations:
  • Department of Electrical Engineering, Katholieke Universiteit Leuven, 3001 Leuven, Belgium;Department of Obstetrics and Gynecology, University Hospitals Leuven, Katholieke Universiteit Leuven, 3000 Leuven, Belgium;Department of Obstetrics and Gynecology, University Hospitals Leuven, Katholieke Universiteit Leuven, 3000 Leuven, Belgium;Department of Obstetrics and Gynaecology, St. George's Hospital, St. George's University of London, London SW17 0RE, UK;Early Pregnancy and Gynecological Unit, St. George's Hospital, St. George's University of London, London SW17 0RE, UK;Department of Obstetrics and Gynecology, University Hospitals Leuven, Katholieke Universiteit Leuven, 3000 Leuven, Belgium and Hospital Oost-Limburg, 3600 Genk, Belgium;Malmö University Hospital, Lund University, SE 20502 Malmö, Sweden;Department of Obstetrics and Gynecology, University Hospitals Leuven, Katholieke Universiteit Leuven, 3000 Leuven, Belgium and Hammersmith Hospital, Imperial College London, London W12 0NN, UK;Department of Electrical Engineering, Katholieke Universiteit Leuven, 3001 Leuven, Belgium

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objective: Despite the rise of high-throughput technologies, clinical data such as age, gender and medical history guide clinical management for most diseases and examinations. To improve clinical management, available patient information should be fully exploited. This requires appropriate modeling of relevant parameters. Methods: When kernel methods are used, traditional kernel functions such as the linear kernel are often applied to the set of clinical parameters. These kernel functions, however, have their disadvantages due to the specific characteristics of clinical data, being a mix of variable types with each variable its own range. We propose a new kernel function specifically adapted to the characteristics of clinical data. Results: The clinical kernel function provides a better representation of patients' similarity by equalizing the influence of all variables and taking into account the range r of the variables. Moreover, it is robust with respect to changes in r. Incorporated in a least squares support vector machine, the new kernel function results in significantly improved diagnosis, prognosis and prediction of therapy response. This is illustrated on four clinical data sets within gynecology, with an average increase in test area under the ROC curve (AUC) of 0.023, 0.021, 0.122 and 0.019, respectively. Moreover, when combining clinical parameters and expression data in three case studies on breast cancer, results improved overall with use of the new kernel function and when considering both data types in a weighted fashion, with a larger weight assigned to the clinical parameters. The increase in AUC with respect to a standard kernel function and/or unweighted data combination was maximum 0.127, 0.042 and 0.118 for the three case studies. Conclusion: For clinical data consisting of variables of different types, the proposed kernel function - which takes into account the type and range of each variable - has shown to be a better alternative for linear and non-linear classification problems.