Efficient Privacy Preserving Protocols for Similarity Join

Authors:
Bilal Hawashin;Farshad Fotouhi;Traian Marius Truta;William Grosky
Affiliations:
Dept. of Computer Science/ Wayne State University/ Detroit/ MI 48202. e-mail: hawashin@wayne.edu;Dept. of Computer Science/ Wayne State University/ Detroit/ MI 48202. e-mail: fotouhi@wayne.edu;Dept. of Computer Science/ Northern Kentucky University/ Highland Heights/ KY 41099/ USA. e-mail: trutat1@nku.edu;Dept. of Computer and Information Science/ University of Michigan - Dearborn/ Dearborn/ MI 48128/ USA. e-mail: wgrosky@umich.edu
Venue:
Transactions on Data Privacy
Year:
2012

Citing 12
Cited 0

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Properties of Embedding Methods for Similarity Searching in Metric Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information sharing across private databases

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Blocking-aware private record linkage

Proceedings of the 2nd international workshop on Information quality in information systems
A Heterogeneous Field Matching Method for Record Linkage

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Privacy preserving schema and data matching

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Example-driven design of efficient record matching queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Privacy Preserving Record Linkage Using Phonetic Codes

BCI '09 Proceedings of the 2009 Fourth Balkan Conference in Informatics
Diffusion Maps: A Superior Semantic Method to Improve Similarity Join Performance

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the similarity join process, one or more sources may not allow sharing its data with other sources. In this case, a privacy preserving similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy using supervised learning. However, the existing secure protocols for similarity join methods can not be used to join sources using these long attributes. Moreover, the majority of the existing privacyâ聙聬preserving protocols do not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join sources when the join attributes are long attributes. We provide two secure protocols for both scenarios when a training set exists and when there is no available training set. Furthermore, we introduced the multiâ聙聬label supervised secure protocol and the expandable supervised secure protocol. Results show that our protocols can efficiently join sources using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.