Using text classification to predict the gene knockout behaviour of S. Cerevisiae

  • Authors:
  • Patrick Caldon

  • Affiliations:
  • School of Computer Science and Engineering, University of New South Wales, Kensington, NSW, Australia

  • Venue:
  • APBC '03 Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics 2003 - Volume 19
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A naive Bayes classifier was used to analyze gene behavior based on text data and presented as an entry for the 2002 KDD Cup, a data mining exercise to predict the behavior of the yeast S. Cerevisiae. The solution presented was based on the multinomial event model for text classification(McCallum & Nigam 1998) with a feature selection mechanism added. Despite this simple model, performance close to that of the best entries in the competition could be obtained, which were using more sophisticated techniques. It appears that seemingly minor effort in using prior knowledge to conflate the gene classes, as well as the previously described effectiveness of the naive Bayes method contributed to this success.