Secure outsourcing of sequence comparisons

  • Authors:
  • Mikhail J. Atallah;Jiangtao Li

  • Affiliations:
  • CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN;CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN

  • Venue:
  • PET'04 Proceedings of the 4th international conference on Privacy Enhancing Technologies
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale problems in the physical and life sciences are being revolutionized by Internet computing technologies, like grid computing, that make possible the massive cooperative sharing of computational power, bandwidth, storage, and data. A weak computational device, once connected to such a grid, is no longer limited by its slow speed, small amounts of local storage, and limited bandwidth: It can avail itself of the abundance of these resources that is available elsewhere on the network. An impediment to the use of “computational outsourcing” is that the data in question is often sensitive, e.g., of national security importance, or proprietary and containing commercial secrets, or to be kept private for legal requirements such as the HIPAA legislation, Gramm-Leach-Bliley, or similar laws. This motivates the design of techniques for computational outsourcing in a privacy-preserving manner, i.e., without revealing to the remote agents whose computational power is being used, either one's data or the outcome of the computation on the data. This paper investigates such secure outsourcing for widely applicable sequence comparison problems, and gives an efficient protocol for a customer to securely outsource sequence comparisons to two remote agents, such that the agents learn nothing about the customer's two private sequences or the result of the comparison. The local computations done by the customer are linear in the size of the sequences, and the computational cost and amount of communication done by the external agents are close to the time complexity of the best known algorithm for solving the problem on a single machine (i.e., quadratic, which is a huge computational burden for the kinds of massive data on which such comparisons are made). The sequence comparison problem considered arises in a large number of applications, including speech recognition, machine vision, and molecular sequence comparisons. In addition, essentially the same protocol can solve a larger class of problems whose standard dynamic programming solutions are similar in structure to the recurrence that subtends the sequence comparison algorithm.