Sequential patterns to discover and characterise biological relations

  • Authors:
  • Peggy Cellier;Thierry Charnois;Marc Plantevit

  • Affiliations:
  • Université de Caen, CNRS Université de Caen, GREYC, UMR6072, France;Université de Caen, CNRS Université de Caen, GREYC, UMR6072, France;Université de Lyon, CNRS Université de Lyon 1, LIRIS, UMR5205, France

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a method to automatically detect and characterise interactions between genes in biomedical literature. Our approach is based on a combination of data mining techniques: frequent sequential patterns filtered by linguistic constraints and recursive mining. Unlike most Natural Language Processing (NLP) approaches, our approach does not use syntactic parsing to learn and apply linguistic rules. It does not require any resource except the training corpus to learn patterns. The process is in two steps. First, frequent sequential patterns are extracted from the training corpus. Second, after validation of those patterns, they are applied on the application corpus to detect and characterise new interactions. An advantage of our method is that interactions can be enhanced with modalities and biological information. We use two corpora containing only sentences with gene interactions as training corpus. Another corpus from PubMed abstracts is used as application corpus. We conduct an evaluation that shows that the precision of our approach is good and the recall correct for both targets: interaction detection and interaction characterisation.