CoMRI: A Compressed Multi-Resolution Index Structure for Sequence Similarity Queries

  • Authors:
  • Hong Sun;Ozgur Ozturk;Hakan Ferhatosmanoglu

  • Affiliations:
  • -;-;-

  • Venue:
  • CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present CoMRI, Compressed Multi-ResolutionIndex, our system for fast sequence similaritysearch in DNA sequence databases. We employ VirtualBounding Rectangle (VBR) concept to build a compressed,grid style index structure. An advantage of grid format overtrees is subsequence location information is given by theorder of corresponding VBR in the VBR list. Taking advantageof VBRs, our index structure fits into a reasonablesize of memory easily. Together with a new optimized multi-resolutionsearch algorithm, the query speed is improvedsignificantly. Extensive performance evaluations on HumanChromosome sequence data show that VBRs save 80%-93%index storage size compared to MBRs (Minimum oundingRectangles) and new search algorithm prunes almost allunnecessary VBRs which guarantees efficient disk I/O andCPU cost. According to the results of our experiments, theperformance of CoMRI is at least 100 times faster than MRSwhich is another grid index structure introduced very recently.