Mining frequent patterns by pattern-growth: methodology and implications
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Efficient discovery of risk patterns in medical data
Artificial Intelligence in Medicine
Mining frequent itemsets from uncertain data
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
WHAM: a high-throughput sequence alignment method
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
MapReducing a genomic sequencing workflow
Proceedings of the second international workshop on MapReduce and its applications
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Today large sequencing centers are producing genomic data at the rate of 10 terabytes a day and require complicated processing to transform massive amounts of noisy raw data into biological information. To address these needs, we develop a system for end-to-end processing of genomic data, including alignment of short read sequences, variation discovery, and deep analysis. We also employ a range of quality control mechanisms to improve data quality and parallel processing techniques for performance. In the demo, we will use real genomic data to show details of data transformation through the workflow, the usefulness of end results (ready for use as testable hypotheses), the effects of our quality control mechanisms and improved algorithms, and finally performance improvement.