Algorithms for clustering data
Algorithms for clustering data
Vector quantization and signal compression
Vector quantization and signal compression
Elements of information theory
Elements of information theory
Parallel molecular computation
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
The power of surface-based DNA computation (extended abstract)
RECOMB '97 Proceedings of the first annual international conference on Computational molecular biology
Source Coding Theory
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
Challenges and Applications for Self-Assembled DNA Nanostructures
DNA '00 Revised Papers from the 6th International Workshop on DNA-Based Computers: DNA Computing
Significantly Lower Entropy Estimates for Natural DNA Sequences
DCC '97 Proceedings of the Conference on Data Compression
DCC '99 Proceedings of the Conference on Data Compression
Challenges and Applications for Self-Assembled DNA Nanostructures
DNA '00 Revised Papers from the 6th International Workshop on DNA-Based Computers: DNA Computing
Experimental Construction of Very Large Scale DNA Databases with Associative Search Capability
DNA 7 Revised Papers from the 7th International Workshop on DNA-Based Computers: DNA Computing
Hierarchical DNA Memory Based on Nested PCR
DNA8 Revised Papers from the 8th International Workshop on DNA Based Computers: DNA Computing
Efficiency and reliability of DNA-based memories
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
A bayesian algorithm for in vitro molecular evolution of pattern classifiers
DNA'04 Proceedings of the 10th international conference on DNA computing
Hi-index | 0.00 |
The main theme of this paper is to take inspiration from methods used in computer science and related disciplines, and to apply these to develop improved biotechnology. In particular, our proposed improvements are made by adapting various information theoretic coding techniques which originate in computational and information processing disciplines, but which we re-tailor to work in the biotechnology context. (a) We apply Error-Correcting Codes, developed to correct transmission errors in electronic media, to decrease (in certain contexts, optimally) error rates in optically-addressed DNA synthesis (e.g., of DNA chips). (b) We apply Vector-Quantization (VQ) Coding techniques (which were previously used to cluster, quantize, and compress data such as speech and images) to improve I/O rates (in certain contexts, optimally) for transformation of electronic data to and from DNA with bounded error. (c) We also apply VQ Coding techniques, some of which hierarchically cluster the data, to improve associative search in DNA databases by reducing the problem to that of exact affinity separation. These improvements in biotechnology appear to have some general applicability beyond biomolecular computing. As a motivating example, this paper improves biotechnology methods to do associative search in DNA databases. Baum [B95] previously proposed the use of biotechnology affinity methods (DNA annealing) to do massively parallel associative search in large databases encoded as DNA strands, but many remaining issues were not developed. Using in part our improved biotechnology techniques based on Error-Correction and VQ Coding, we develop detailed procedures for the following tasks: (i) The database may initially be in conventional (electronic, magnetic, or optical) media, rather than the form of DNA strands. For input and output (I/O) to and from conventional media, we apply DNA chip technology improved by Error-Correction and VQ Coding methods for error-correction and compression. (ii) The query may not be an exact match or even partial match with any data in the database, but since DNA annealing affinity methods work best for these cases, we apply various VQ Coding methods for refining the associative search to exact matches. (iii) We also briefly discuss how to extend associative search queries in DNA databases to more sophisticated hybrid queries that include also Boolean formula conditionals with a bounded number of Boolean variables, by combining our methods for DNA associative search with known BMC methods for solving small size SAT problems. For example, these extended queries could be executed on natural DNA strands (e.g., from blood or other body tissues) which are appended with DNA words encoding binary information about each strand, and the appended information could consist of the social security number of the person whose DNA was sampled, cell type, the date, further medical data, etc.