Automated data discovery in similarity score queries

Authors:
Fatih Altiparmak;Ali Saman Tosun;Hakan Ferhatosmanoglu;Ahmet Sacan
Affiliations:
The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH;The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH and The University of Texas at San Antonio, Dept. of Computer Science;The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH;The Ohio State University, Dept. of Computer Sci. & Eng., Columbus, OH and Middle East Technical University, Dept. of Computer Eng., Ankara, Turkey
Venue:
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Year:
2008

Citing 15
Cited 0

Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Scalable access within the context of digital libraries

IEEE ADL '97 Proceedings of the IEEE international forum on Research and technology advances in digital libraries
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Vector approximation based indexing for non-uniform high dimensional data sets

Proceedings of the ninth international conference on Information and knowledge management
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
Vulnerabilities in similarity search based systems

Proceedings of the eleventh international conference on Information and knowledge management
Fast and Effective Retrieval of Medical Tumor Shapes

IEEE Transactions on Knowledge and Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases

Proceedings of the 17th International Conference on Data Engineering
Nearest Neighbor Classification in 3D Protein Databases

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Constrained Nearest Neighbor Queries

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A vast amount of information is being stored in scientific databases on the web. The dynamic nature of the scientific data, the cost of providing an up-to-date snapshot of the whole database, and proprietary considerations compel the database owners to hide the original data behind search interfaces. The information is often provided to researchers through similarity-search query interfaces, which limits a proper and focused analysis of the data. In this study, we present systematic methods of data discovery through similarity-score queries in such "uncooperative" databases. The methods are generalized to multidimensional data, and to L-p norm distance functions. The accuracy and performance of our methods are demonstrated on synthetic and real-life datasets. The methods developed in this study enable the scientists to obtain the data within the range of their research interests, overcoming the limitations of the similarity-search interface. The results of this study also present implications in data privacy and security areas, where the discovery of the original data is not desired.