A Privacy Preserving Markov Model for Sequence Classification

Authors:
Suxin Guo;Sheng Zhong;Aidong Zhang
Affiliations:
Department of Computer Science and Engineering, SUNY at Buffalo, Buffalo, 14260, U.S.A.;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China;Department of Computer Science and Engineering, SUNY at Buffalo, Buffalo, 14260, U.S.A.
Venue:
Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics
Year:
2013

Citing 23
Cited 0

Foundations of Cryptography: Basic Tools

Foundations of Cryptography: Basic Tools
The Decision Diffie-Hellman Problem

ANTS-III Proceedings of the Third International Symposium on Algorithmic Number Theory
Tools for privacy preserving distributed data mining

ACM SIGKDD Explorations Newsletter
Privacy-Preserving Cooperative Statistical Analysis

ACSAC '01 Proceedings of the 17th Annual Computer Security Applications Conference
Privacy-Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
Privacy-Preserving Outlier Detection

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Deriving private information from randomized data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Privacy-preserving distributed k-means clustering over arbitrarily partitioned data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Privacy Preserving Data Classification with Rotation Perturbation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Access-Private Outsourcing of Markov Chain and RandomWalk based Data Analysis Applications

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Privacy-preservation for gradient descent methods

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A novel knowledge-based approach to design inorganic-binding peptides

Bioinformatics
Homomorphic encryption and secure comparison

International Journal of Applied Cryptography
Towards Practical Privacy for Genomic Computation

SP '08 Proceedings of the 2008 IEEE Symposium on Security and Privacy
Privacy-Preserving Singular Value Decomposition

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Augmented training of hidden Markov models to recognize remote homologs via simulated evolution

Bioinformatics
Public-key cryptosystems based on composite degree residuosity classes

EUROCRYPT'99 Proceedings of the 17th international conference on Theory and application of cryptographic techniques
A hybrid multi-group privacy-preserving approach for building decision trees

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Privacy-preserving linear fisher discriminant analysis

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
On private scalar product computation for privacy-preserving data mining

ICISC'04 Proceedings of the 7th international conference on Information Security and Cryptology
Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation

TCC'06 Proceedings of the Third conference on Theory of Cryptography
A Framework for Secure Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
A public key cryptosystem and a signature scheme based on discrete logarithms

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequence classification has attracted much interest in recent years due to its difference from the traditional classification tasks, as well as its wide applications in many fields, such as bioinformatics. As it is not easy to define specific "features" for sequence data as in traditional feature based classifications, many methods have been developed to utilize the particular characteristics of sequences. One common way of classifying sequence data is to use probabilistic generative models, such as the Markov model, to learn the probability distribution of sequences in each class. One thing that should be considered in the research of sequence classification is the privacy issue. In many cases, especially in the bioinformatics field, the sequence data contains sensitive information which obstructs the mining of data. For example, the DNA and protein sequences of individuals are highly sensitive and should not be released without protection. But in the real world, data is usually distributed among different parties and for the parties, training only with their own data may not give them strong enough models. This raises a problem when some parties, each holding a set of sequences, want to learn the Markov models on the union of their data, but do not want to reveal their data to others due to the privacy concerns. In this paper, we address this problem and propose a method to train the Markov models, from the ones of the first order to the ones of order k where k 1, on sequence data distributed among parties without revealing each party's private sequences to others. We apply the homomorphic encryption to protect the sensitive information.