A bare bones approach to literature-based discovery: an analysis of the raynaud's/fish-oil and migraine-magnesium discoveries in semantic space

  • Authors:
  • R. J. Cole;P. D. Bruza

  • Affiliations:
  • School of Info. Tech. and Elec. Eng., University of Queensland;Distributed Systems Technology Centre, University of Queensland

  • Venue:
  • DS'05 Proceedings of the 8th international conference on Discovery Science
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Literature discovery can be characterized as a goal directed search for previously unknown implicit knowledge captured within a collection of scientific articles. Swanson's serendipitous discovery of a treatment for Raynaud's disease by dietary fish-oil while browsing Medline, an online collection of biomedical literature, exemplifies such a discovery. By means of a series of experiments, the impact of stop words, various weighting schemes, discovery mechanisms, and contextual reduction are studied in relation to replicating the Raynaud/fish-oil and migraine-magnesium discoveries by operational means. Two aspects of discovery were brought under focus: (i) the discovery of intermediate, or B –terms, and (ii) the discovery of indirect A – C connections via the B–terms. A semantic space representation of the underlying corpus is computed and discoveries automated by computing associations between words in both higher and contextually reduced spaces. It was found that the discovery of B–terms and A – C connections can be achieved to an encouraging degree with a standard stop word list. In addition, no single weighting scheme seems to suffice. Log-likelihood appears to be potentially effective for leading to the discovery of B–terms, whereas both odds ratio and simple co-occurrence frequencies both facilitate the discovery of A – C connections. With regard to discovery mechanism, both semantic similarity (via cosine) and information flow computation seem promising for computing A – C connections, but more research is needed to understand their relative strengths and weaknesses. Discovery in a contextually reduced semantic space revealed mixed results.