Relational learning of pattern-match rules for information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Database Management Systems
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
Natural Language Engineering
GATE: an architecture for development of robust HLT applications
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Integrating Unstructured Data into Relational Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Towards a query optimizer for text-centric tasks
ACM Transactions on Database Systems (TODS)
Declarative information extraction using datalog with embedded extraction predicates
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Get another label? improving data quality and data mining using multiple, noisy labelers
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A quality-aware optimizer for information extraction
ACM Transactions on Database Systems (TODS)
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing SQL Queries over Text Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploring a Few Good Tuples from Text Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A probabilistic model of redundancy in information extraction
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A quality-aware optimizer for information extraction
ACM Transactions on Database Systems (TODS)
Enterprise information extraction: recent developments and open challenges
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Just-in-time information extraction using extraction views
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
MADden: query-driven statistical text analytics
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Text documents often embed data that is structured in nature. This structured data is increasingly exposed using information extraction systems, which generate structured relations from documents, introducing an opportunity to process expressive, structured queries over text databases. This paper discusses our SQoUT1 project, which focuses on processing structured queries over relations extracted from text databases. We show how, in our extraction-based scenario, query processing can be decomposed into a sequence of basic steps: retrieving relevant text documents, extracting relations from the documents, and joining extracted relations for queries involving multiple relations. Each of these steps presents different alternatives and together they form a rich space of possible query execution strategies. We identify execution efficiency and output quality as the two critical properties of a query execution, and argue that an optimization approach needs to consider both properties. To this end, we take into account the userspecified requirements for execution efficiency and output quality, and choose an execution strategy for each query based on a principled, cost-based comparison of the alternative execution strategies.