Automated data discovery in similarity score queries

  • Authors:
  • Fatih Altiparmak;Ali Saman Tosun;Hakan Ferhatosmanoglu;Ahmet Sacan

  • Affiliations:
  • The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH;The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH and The University of Texas at San Antonio, Dept. of Computer Science;The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH;The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH and Middle East Technical University, Dept. of Computer Eng., Ankara, Turkey

  • Venue:
  • DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A vast amount of information is being stored in scientific databases on the web. The dynamic nature of the scientific data, the cost of providing an up-to-date snapshot of the whole database, and proprietary considerations compel the database owners to hide the original data behind search interfaces. The information is often provided to researchers through similarity-search query interfaces, which limits a proper and focused analysis of the data. In this study, we present systematic methods of data discovery through similarity-score queries in such "uncooperative" databases. The methods are generalized to multidimensional data, and to L-p norm distance functions. The accuracy and performance of our methods are demonstrated on synthetic and real-life datasets. The methods developed in this study enable the scientists to obtain the data within the range of their research interests, overcoming the limitations of the similarity-search interface. The results of this study also present implications in data privacy and security areas, where the discovery of the original data is not desired.