GeoDeepDive: statistical inference using familiar data-processing languages

Authors:
Ce Zhang;Vidhya Govindaraju;Jackson Borchardt;Tim Foltz;Christopher Ré;Shanan Peters
Affiliations:
University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA;University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 4
Cited 0

Tuffy: scaling up statistical inference in Markov logic networks using an RDBMS

Proceedings of the VLDB Endowment
The MADlib analytics library: or MAD skills, the SQL

Proceedings of the VLDB Endowment
Big data versus the crowd: looking for relationships in all the right places

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe our proposed demonstration of GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles. This requires solving a host of classical data management challenges including data acquisition (e.g., from scanned documents), data extraction, and data integration. SIGMOD attendees will see demonstrations of three aspects of our system: (1) an end-to-end system that is of a high enough quality to perform novel geological science, but is written by a small enough team so that each aspect can be manageably explained; (2) a simple feature engineering system that allows a user to write in familiar SQL or Python; and (3) the effect of different sources of feedback on result quality including expert labeling, distant supervision, traditional rules, and crowd-sourced data. Our prototype builds on our work integrating statistical inference and learning tools into traditional database systems. If successful, our demonstration will allow attendees to see that data processing systems that use machine learning contain many familiar data processing problems such as efficient querying, indexing, and supporting tools for database-backed websites, none of which are machine-learning problems, per se.