Probabilistic classifiers and automated cancer registration: An exploratory application

  • Authors:
  • Sandro Tognazzo;Bovo Emanuela;Fiore Anna Rita;Guzzinati Stefano;Monetti Daniele;Stocco Cramen Fiorella;Zambon Paola

  • Affiliations:
  • Venetian Tumour Registry, Registro Tumori del Veneto, Istituto Oncologico Veneto-IRCCS, 35128 Padua, Italy;Venetian Tumour Registry, Registro Tumori del Veneto, Istituto Oncologico Veneto-IRCCS, 35128 Padua, Italy;Venetian Tumour Registry, Registro Tumori del Veneto, Istituto Oncologico Veneto-IRCCS, 35128 Padua, Italy;Venetian Tumour Registry, Registro Tumori del Veneto, Istituto Oncologico Veneto-IRCCS, 35128 Padua, Italy;Venetian Tumour Registry, Registro Tumori del Veneto, Istituto Oncologico Veneto-IRCCS, 35128 Padua, Italy;Venetian Tumour Registry, Registro Tumori del Veneto, Istituto Oncologico Veneto-IRCCS, 35128 Padua, Italy;Department of Oncology, University of Padua, Padua, Italy

  • Venue:
  • Journal of Biomedical Informatics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A test of the performance of two probabilistic classifiers (random forests and multinomial logit models) in automatically defining cancer cases has been carried out on 5608 subjects, registered by the Venetian Tumour Registry (RTV) during the years 1987-1996 and manually checked for possible second cancers that occurred during the 1997-1999 period. An eightfold cross-validation was performed to estimate the classification error; 63 predictive variables were entered into the model fitting. The random forest allows to automatically classify 45% of subjects with a classification error lower than 5%, while the corresponding error is 31% for the multilogit model. The performance of the former classifier is appealing, indicating a potential drop of manually checked cases from 1750 to 960 per incidence year with a moderate error rate. This result suggests to refine the approach and extend it to other categories of manually treated cases.