A privacy preserving efficient protocol for semantic similarity join using long string attributes

Authors:
Bilal Hawashin;Farshad Fotouhi;Traian Marius Truta
Affiliations:
Wayne State University Detroit, MI;Wayne State University Detroit, MI;Northern Kentucky University, Highland Heights, KY
Venue:
Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society
Year:
2011

Citing 7
Cited 2

Properties of Embedding Methods for Similarity Searching in Metric Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information sharing across private databases

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Blocking-aware private record linkage

Proceedings of the 2nd international workshop on Information quality in information systems
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Privacy preserving schema and data matching

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy Preserving Record Linkage Using Phonetic Codes

BCI '09 Proceedings of the 2009 Fourth Balkan Conference in Informatics
Diffusion Maps: A Superior Semantic Method to Improve Similarity Join Performance

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops

Protocol to compute polygon intersection in STC model

ICICA'11 Proceedings of the Second international conference on Information Computing and Applications
A taxonomy of privacy-preserving record linkage techniques

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the similarity join process, one or more sources may not allow sharing the whole data with other sources. In this case, privacy preserved similarity join is required. We showed in our previous work [4] that using long attributes, such as paper abstracts, movie summaries, product descriptions, and user feedbacks, could improve the similarity join accuracy under supervised learning. However, the existing secure protocols for similarity join methods can not be used to join tables using these long attributes. Moreover, the majority of the existing privacy-preserving protocols did not consider the semantic similarities during the similarity join process. In this paper, we introduce a secure efficient protocol to semantically join tables when the join attributes are long attributes. Furthermore, instead of using machine learning methods, which are not always applicable, we use similarity thresholds to decide matched pairs. Results show that our protocol can efficiently join tables using the long attributes by considering the semantic relationships among the long string values. Therefore, it improves the overall secure similarity join performance.