Integrated term weighting, visualization, and user interface development for bioinformation retrieval

  • Authors:
  • Min Hong;Anis Karimpour-Fard;Steve Russell;Lawrence Hunter

  • Affiliations:
  • Bioinformatics, University of Colorado Health Sciences Center, Denver, CO;Bioinformatics, University of Colorado Health Sciences Center, Denver, CO;Bioinformatics, University of Colorado Health Sciences Center, Denver, CO;Bioinformatics, University of Colorado Health Sciences Center, Denver, CO

  • Venue:
  • AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

This project implements an integrated biological information website that classifies technical documents, learns about users' interests, and offers intuitive interactive visualization to navigate vast information spaces. The effective use of modern software engineering principles, system environments, and development approaches is demonstrated. Straightforward yet powerful document characterization strategies are illustrated, helpful visualization for effective knowledge transfer is shown, and current user interface methodologies are applied. A specific success of note is the collaboration of disparately skilled specialists to deliver a flexible integrated prototype in a rapid manner that meets user acceptance and performance goals. The domain chosen for the demonstration is breast cancer, using a corpus of abstracts from publications obtained online from Medline. The terms in the abstracts are extracted by word stemming and a stop list, and are encoded in vectors. A TF-IDF technique is implemented to calculate similarity scores between a set of documents and a query. Polysemy and synonyms are explicitly addressed. Groups of related and useful documents are identified using interactive visual displays such as a spiral graph that represents of the overall similarity of documents. K-means clustering of the similarities among a document set is used to display a 3-D relationship map. User identities are established and updated by observing the patterns of terms used in their queries, and from login site locations. Explicit considerations of changing user category profiles, site stakeholders, information modeling, and networked technologies are pointed out.