Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Authors:
Zina Ben Miled;Jin Liu;Omran Bukhres;Huian Li;Jesse Martin;Chavali Balagopalakrishna;Robert Oppelt
Affiliations:
Electrical & Computer Engineering, School of Eng. & Tech., Indiana University Purdue University Indianapolis, IN, 46202, USA. zmiled@iupui.edu;Computer & Information Science, School of Science, Indiana University Purdue University Indianapolis, IN, 46202, USA;Computer & Information Science, School of Science, Indiana University Purdue University Indianapolis, IN, 46202, USA;Computer & Information Science, School of Science, Indiana University Purdue University Indianapolis, IN, 46202, USA;Eli Lilly & Company, Indianapolis, IN, 46202, USA;Eli Lilly & Company, Indianapolis, IN, 46202, USA;Eli Lilly & Company, Indianapolis, IN, 46202, USA
Venue:
Journal of Intelligent Information Systems
Year:
2004

Citing 13
Cited 0

Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
DB2 parallel edition

IBM Systems Journal
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
Efficient mid-query re-optimization of sub-optimal query execution plans

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The design and implementation of INGRES

ACM Transactions on Database Systems (TODS)
Fundamentals of Database Systems

Fundamentals of Database Systems
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Schema Design for a Pharmaceutical Data Repository

CBMS '00 Proceedings of the 13th IEEE Symposium on Computer-Based Medical Systems (CBMS'00)
The optimization of queries in relational databases

The optimization of queries in relational databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific databases, and in particular chemical and biological databases, have reached massive sizes in recent years due to the improvement of bench-side high throughput screening tools used by scientists. This rapid increase has caused a shift in the bottleneck in discovery and product development from the bench side to the computational side, thus, creating a need for new computational tools that can facilitate the access and interpretation of such massive data.This paper discusses the design and implementation of the computation of a histogram to speed up access to large pharmaceutical databases. As opposed to traditional histograms in which approximate value distributions is obtained by grouping attribute values into buckets, the computation histogram proposed in this paper records the retrieval time and the calculation time of descriptors in a pharmaceutical drug candidate database. Both on-line and off-line update techniques are proposed to update the computation histogram so that an efficient query plan can be generated.The efficiency of the proposed computation histogram is demonstrated by using a drug candidate database which is used in the pharmaceutical drug discovery process. The histogram allows the result of a query to be either computed using a computational algorithm or retrieved from the database. In addition to the pharmaceutical drug candidate database, the proposed approach is applicable to other scientific databases such as biological and agroscience databases.