Concept extraction and association from cancer literature

  • Authors:
  • Yueyu Fu;Travis Bauer;Javed Mostafa;Mathew Palakal;Snehasis Mukhopadhyay

  • Affiliations:
  • School of Library and Information Science, Indiana University, Bloomington, IN;Computer Science Department, Indiana University, Bloomington, IN;Informatics and Information Science, Indiana University, Bloomington, IN;Indiana University-Purdue University, Indianapolis, IN;Indiana University-Purdue University, Indianapolis, IN

  • Venue:
  • Proceedings of the 4th international workshop on Web information and data management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

There is a large and growing body of web accessible biomedical literature. As this body of electronic literature grows, so does the possibility that document analysis techniques can be used to automatically extract useful biomedical information from them, particularly in the discovery of key concepts dealing with genes, proteins, drugs, and diseases and associations among these concepts. VCGS (Vocabulary Cluster Generating System) was designed to automatically extract and determine associations among tokens from a subset of biomedical literature namely cancer. Such information has notable potential to automate database construction in biomedicine, instead of relying on experts' analysis. This paper reports on the mechanisms for automatically generating clusters of tokens. A formal evaluation of the system, based on a subset of 5338 Pubmed titles and abstracts, has been conducted against the Swiss-Prot database in which the associations among concepts are entered by experts by hand.