Self-supervised web search for any-k complete tuples

Authors:
Alexander Löser;Christoph Nagel;Stephan Pieper;Christoph Boden
Affiliations:
University of Technology, Berlin, Berlin;University of Technology, Berlin, Berlin;University of Technology, Berlin, Berlin;University of Technology, Berlin, Berlin
Venue:
Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Year:
2011

Citing 19
Cited 0

Answering queries using views (extended abstract)

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Machine Learning

Machine Learning
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Robust query processing through progressive optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A modular information extraction system

Intelligent Data Analysis
Toward best-effort information extraction

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
The YAGO-NAGA approach to knowledge discovery

ACM SIGMOD Record
Optimizing SQL Queries over Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploring a Few Good Tuples from Text Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Beyond Search: Web-Scale Business Analytics

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems
FactRank: random walks on a web of facts

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Classification algorithms for relation prediction

ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common task of Web users is querying structured information from Web pages. In this paper we propose a novel query processor for systematically discovering any-k relations from Web search results with conjunctive queries. The 'any-k' phrase denotes that retrieved tuples are not ranked by the system. For realizing this interesting scenario the query processor transfers a structured query into keyword queries that are submitted to a search engine, forwards search results to relation extractors, and then combines relations into result tuples. Unfortunately, relation extractors may fail to return a relation for a result tuple. We propose a solid information theory-based approach for retrieving missing attribute values of partially retrieved relations. Moreover, user-defined data sources may not return at least k complete result tuples. To solve this problem, we extend the Eddy query processing mechanism [14] for our 'querying the Web' scenario with a continuous, adaptive routing model. The model determines the most promising next incomplete row for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our experiments demonstrate that our query processor returns complete result tuples while processing only very few Web pages.