Multi-way association extraction and visualization from biological text documents using hyper-graphs: Applications to genetic association studies for diseases

  • Authors:
  • Snehasis Mukhopadhyay;Mathew Palakal;Kalyan Maddu

  • Affiliations:
  • Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 West Michigan Street SL 280J, Indianapolis, IN 46202, USA;Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 West Michigan Street SL 280J, Indianapolis, IN 46202, USA;Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 West Michigan Street SL 280J, Indianapolis, IN 46202, USA

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objectives: Biological research literature, as in many other domains of human endeavor, represents a rich, ever growing source of knowledge. An important form of such biological knowledge constitutes associations among biological entities such as genes, proteins, diseases, drugs and chemicals, etc. There has been a considerable amount of recent research in extraction of various kinds of binary associations (e.g., gene-gene, gene-protein, protein-protein, etc.) using different text mining approaches. However, an important aspect of such associations (e.g., ''gene A activates protein B'') is identifying the context in which such associations occur (e.g., ''gene A activates protein B in the context of disease C in organ D under the influence of chemical E''). Such contexts can be represented appropriately by a multi-way relationship involving more than two objects (e.g., objects A, B, C, D, E) rather than usual binary relationship (objects A and B). Methods: Such multi-way relations naturally lead to a hyper-graph representation of the knowledge rather than a binary graph. The hyper-graph based multi-way knowledge extraction from biological text literature represents a computationally difficult problem (due to its combinatorial nature) which has not received much attention from the Bioinformatics research community. In this paper, we describe and compare two different approaches to such multi-way hyper-graph extraction: one based on an exhaustive enumeration of all multi-way hyper-edges and the other based on an extension of the well-known A Priori algorithm for structured data to the case unstructured textual data. We also present a representative graph based approach towards visualizing these genetic association hyper-graphs. Results: Two case studies are conducted for two biomedical problems (related to the diseases of lung cancer and colorectal cancer respectively), illustrating that the latter approach (using the text-based A Priori method) identifies the same hyper-edges as the former approach (the exhaustive method), but at a much less computational cost. The extracted hyper-relations are presented in the paper as cognition-rich representative graphs, representing the corresponding hyper-graphs. Conclusions: The text-based A Priori algorithm is a practical, useful method to extract hyper-graphs representing multi-way associations among biological objects. These hyper-graphs and their visualization using representative graphs can provide important contextual information for understanding gene-gene associations relevant to specific diseases.