A Privacy Preserving Markov Model for Sequence Classification

  • Authors:
  • Suxin Guo;Sheng Zhong;Aidong Zhang

  • Affiliations:
  • Department of Computer Science and Engineering, SUNY at Buffalo, Buffalo, 14260, U.S.A.;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China;Department of Computer Science and Engineering, SUNY at Buffalo, Buffalo, 14260, U.S.A.

  • Venue:
  • Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequence classification has attracted much interest in recent years due to its difference from the traditional classification tasks, as well as its wide applications in many fields, such as bioinformatics. As it is not easy to define specific "features" for sequence data as in traditional feature based classifications, many methods have been developed to utilize the particular characteristics of sequences. One common way of classifying sequence data is to use probabilistic generative models, such as the Markov model, to learn the probability distribution of sequences in each class. One thing that should be considered in the research of sequence classification is the privacy issue. In many cases, especially in the bioinformatics field, the sequence data contains sensitive information which obstructs the mining of data. For example, the DNA and protein sequences of individuals are highly sensitive and should not be released without protection. But in the real world, data is usually distributed among different parties and for the parties, training only with their own data may not give them strong enough models. This raises a problem when some parties, each holding a set of sequences, want to learn the Markov models on the union of their data, but do not want to reveal their data to others due to the privacy concerns. In this paper, we address this problem and propose a method to train the Markov models, from the ones of the first order to the ones of order k where k 1, on sequence data distributed among parties without revealing each party's private sequences to others. We apply the homomorphic encryption to protect the sensitive information.