Answering queries using views (extended abstract)
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Machine Learning
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Robust query processing through progressive optimization
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A modular information extraction system
Intelligent Data Analysis
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems
ACM Computing Surveys (CSUR)
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
The YAGO-NAGA approach to knowledge discovery
ACM SIGMOD Record
Optimizing SQL Queries over Text Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploring a Few Good Tuples from Text Databases
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Beyond Search: Web-Scale Business Analytics
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Quality-driven query answering for integrated information systems
Quality-driven query answering for integrated information systems
FactRank: random walks on a web of facts
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Classification algorithms for relation prediction
ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops
Hi-index | 0.00 |
A common task of Web users is querying structured information from Web pages. In this paper we propose a novel query processor for systematically discovering any-k relations from Web search results with conjunctive queries. The 'any-k' phrase denotes that retrieved tuples are not ranked by the system. For realizing this interesting scenario the query processor transfers a structured query into keyword queries that are submitted to a search engine, forwards search results to relation extractors, and then combines relations into result tuples. Unfortunately, relation extractors may fail to return a relation for a result tuple. We propose a solid information theory-based approach for retrieving missing attribute values of partially retrieved relations. Moreover, user-defined data sources may not return at least k complete result tuples. To solve this problem, we extend the Eddy query processing mechanism [14] for our 'querying the Web' scenario with a continuous, adaptive routing model. The model determines the most promising next incomplete row for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our experiments demonstrate that our query processor returns complete result tuples while processing only very few Web pages.