Neurolinguistic approach to natural language processing with applications to medical text analysis

  • Authors:
  • Włodzisław Duch;Paweł Matykiewicz;John Pestian

  • Affiliations:
  • Department of Informatics, Nicolaus Copernicus University, Grudzidzka 5, 87-100 Toruń, Poland and School of Computer Engineering, Nanyang Technological University, 639798 Singapore, Singapore;School of Computer Engineering, Nanyang Technological University, 639798 Singapore, Singapore and Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, OH, USA;Department of Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, OH, USA

  • Venue:
  • Neural Networks
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Understanding written or spoken language presumably involves spreading neural activation in the brain. This process may be approximated by spreading activation in semantic networks, providing enhanced representations that involve concepts not found directly in the text. The approximation of this process is of great practical and theoretical interest. Although activations of neural circuits involved in representation of words rapidly change in time snapshots of these activations spreading through associative networks may be captured in a vector model. Concepts of similar type activate larger clusters of neurons, priming areas in the left and right hemisphere. Analysis of recent brain imaging experiments shows the importance of the right hemisphere non-verbal clusterization. Medical ontologies enable development of a large-scale practical algorithm to re-create pathways of spreading neural activations. First concepts of specific semantic type are identified in the text, and then all related concepts of the same type are added to the text, providing expanded representations. To avoid rapid growth of the extended feature space after each step only the most useful features that increase document clusterization are retained. Short hospital discharge summaries are used to illustrate how this process works on a real, very noisy data. Expanded texts show significantly improved clustering and may be classified with much higher accuracy. Although better approximations to the spreading of neural activations may be devised a practical approach presented in this paper helps to discover pathways used by the brain to process specific concepts, and may be used in large-scale applications.