Efficient range queries over uncertain strings

  • Authors:
  • Dongbo Dai;Jiang Xie;Huiran Zhang;Jiaqi Dong

  • Affiliations:
  • School of Computer Engineering and Science, Shanghai University, Shanghai, China;School of Computer Engineering and Science, Shanghai University, Shanghai, China, Department of Mathematics, University of California, Irvine, CA;School of Computer Engineering and Science, Shanghai University, Shanghai, China;School of Computer Science, Fudan University, Shanghai, China

  • Venue:
  • SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Edit distance based string range query is used extensively in the data integration, keyword search, biological function prediction and many others. In the presence of uncertainty, however, answering range queries is more challenging than those in deterministic scenarios since there are exponentially many possible worlds to be considered. This work extends existing filtering techniques tailored for deterministic strings to uncertain settings. We first design probabilistic q-gram filtering method that can work both efficiently and effectively. Another filtering technique, frequency distance based filtering, is also adapted to work with uncertain strings. To achieve further speed-up, we combined two state-of-the-art approaches based on cumulative distribution functions and local perturbation to improve lower bounds and upper bounds. Comprehensive experiment results show that our filter-based scheme, in the uncertain settings, is more efficient than existing methods only leveraging cumulative distribution functions or local perturbation.