Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Gigascope: high performance network monitoring with an SQL interface
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Histogramming Data Streams with Fast Per-Item Processing
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
PADS: a domain-specific language for processing ad hoc data
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The next 700 data description languages
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ANTLRWorks: an ANTLR grammar development environment
Software—Practice & Experience
Adventures in bidirectional programming
FSTTCS'07 Proceedings of the 27th international conference on Foundations of software technology and theoretical computer science
Hi-index | 0.00 |
Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.