Using term lists and inverted files to improve search speed for metabolic pathway databases

  • Authors:
  • Greeshma Neglur;Robert L. Grossman;Natalia Maltsev;Clement Yu

  • Affiliations:
  • Laboratory for Advanced Computing, University of Illinois at Chicago, Chicago, IL;Laboratory for Advanced Computing, University of Illinois at Chicago, Chicago, IL;Argonne National Laboratory, Math and Computer Science Division, Argonne, IL;Department of Computer Science, University of Illinois at Chicago, Chicago, IL

  • Venue:
  • DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a technique for efficiently searching metabolic pathways similar to a given query pathway, from a pathway database. Metabolic pathways can be converted into labeled directed graphs where the nodes represent chemical compounds. Similarity between two graphs can be computed using a metric based on Maximal Common Subgraph (MCS). By maintaining an inverted file that indexes all pathways in a database on their edges, our algorithm finds and ranks all pathways similar to the user input query pathway in time, which is linear in the total number of occurrences of the edges in common with the query in the entire database.