A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Word reordering and a dynamic programming beam search algorithm for statistical machine translation
Computational Linguistics
Compact representations by finite-state transducers
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Design and implementation of the UIMA common analysis system
IBM Systems Journal
Accurate unlexicalized parsing
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Identifying sources of opinions with conditional random fields and extraction patterns
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
BayesStore: managing large, uncertain data repositories with probabilistic graphical models
Proceedings of the VLDB Endowment
Opinion Mining and Sentiment Analysis
Foundations and Trends in Information Retrieval
Fast and Simple Relational Processing of Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Learning with probabilistic features for improved pipeline models
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Scaling high-order character language models to gigabytes
Software '05 Proceedings of the Workshop on Software
Convolution kernels on constituent, dependency and sequential structures for relation extraction
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Scalable learning for object detection with GPU hardware
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Collective cross-document relation extraction without labelled data
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
OCR Post-processing Using Weighted Finite-State Transducers
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Text Processing with GATE
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Contextually guided semantic labeling and search for three-dimensional point clouds
International Journal of Robotics Research
Hi-index | 0.00 |
Big Data Pipelines decompose complex analyses of large data sets into a series of simpler tasks, with independently tuned components for each task. This modular setup allows re-use of components across several different pipelines. However, the interaction of independently tuned pipeline components yields poor end-to-end performance as errors introduced by one component cascade through the whole pipeline, affecting overall accuracy. We propose a novel model for reasoning across components of Big Data Pipelines in a probabilistically well-founded manner. Our key idea is to view the interaction of components as dependencies on an underlying graphical model. Different message passing schemes on this graphical model provide various inference algorithms to trade-off end-to-end performance and computational cost. We instantiate our framework with an efficient beam search algorithm, and demonstrate its efficiency on two Big Data Pipelines: parsing and relation extraction.