Natural Language Engineering
Adaptive information extraction
ACM Computing Surveys (CSUR)
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Introduction to scientific workflow management and the Kepler system
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient Information Extraction over Evolving Text Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Towards a model of provenance and user views in scientific workflows
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Enterprise information extraction: recent developments and open challenges
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Chapter 2: next generation web search
Search Computing
GPText: Greenplum parallel statistical text analysis framework
Proceedings of the Second Workshop on Data Analytics in the Cloud
Hi-index | 0.00 |
We describe the Purple SOX (PSOX) EMS, a prototype Extraction Management System currently being built at Yahoo!. The goal of the PSOX EMS is to manage a large number of sophisticated extraction pipelines across different application domains, at the web scale and with minimum human involvement. Three key value propositions are described: extensibility, the ability to swap in and out extraction operators; explainability, the ability to track the provenance of extraction results; and social feedback support, the facility for gathering and reconciling multiple, potentially conflicting sources.