Natural Language Engineering
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Google's MapReduce programming model – Revisited
Science of Computer Programming
Mining of parsed data to derive deverbal argument structure
GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
Using large-scale parser output to guide grammar development
GEAF '09 Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks
Hi-index | 0.00 |
We describe the design and implementation of a system for data exploration over dependency parses and derived semantic representations in a large-scale NLP-based search system at powerset.com. Because of the distributed nature of the document repository and the processing infrastructure, and also the complex representations of the corpus data, standard text analysis tools such as grep or awk or language modeling toolkits are not applicable. This paper explores the challenges of extracting statistical information and of building language models in such a distributed NLP environment, and introduces a corpus analysis system, Oceanography, that simplifies the writing of analysis code and transparently takes advantage of existing distributed processing infrastructure.