SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Scalable access within the context of digital libraries
IEEE ADL '97 Proceedings of the IEEE international forum on Research and technology advances in digital libraries
A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Vector approximation based indexing for non-uniform high dimensional data sets
Proceedings of the ninth international conference on Information and knowledge management
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Building efficient and effective metasearch engines
ACM Computing Surveys (CSUR)
Vulnerabilities in similarity search based systems
Proceedings of the eleventh international conference on Information and knowledge management
Fast and Effective Retrieval of Medical Tumor Shapes
IEEE Transactions on Knowledge and Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases
Proceedings of the 17th International Conference on Data Engineering
Nearest Neighbor Classification in 3D Protein Databases
Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Fast Nearest Neighbor Search in Medical Image Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Constrained Nearest Neighbor Queries
SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Hi-index | 0.00 |
A vast amount of information is being stored in scientific databases on the web. The dynamic nature of the scientific data, the cost of providing an up-to-date snapshot of the whole database, and proprietary considerations compel the database owners to hide the original data behind search interfaces. The information is often provided to researchers through similarity-search query interfaces, which limits a proper and focused analysis of the data. In this study, we present systematic methods of data discovery through similarity-score queries in such "uncooperative" databases. The methods are generalized to multidimensional data, and to L-p norm distance functions. The accuracy and performance of our methods are demonstrated on synthetic and real-life datasets. The methods developed in this study enable the scientists to obtain the data within the range of their research interests, overcoming the limitations of the similarity-search interface. The results of this study also present implications in data privacy and security areas, where the discovery of the original data is not desired.