Beyond search: Retrieving complete tuples from a text-database

Authors:
Alexander Löser;Christoph Nagel;Stephan Pieper;Christoph Boden
Affiliations:
Database Systems and Information Management Group (DIMA), Technische Universität Berlin (TUB), Berlin, Germany 10587;Database Systems and Information Management Group (DIMA), Technische Universität Berlin (TUB), Berlin, Germany 10587;Database Systems and Information Management Group (DIMA), Technische Universität Berlin (TUB), Berlin, Germany 10587;Database Systems and Information Management Group (DIMA), Technische Universität Berlin (TUB), Berlin, Germany 10587
Venue:
Information Systems Frontiers
Year:
2013

Citing 28
Cited 2

Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Information extraction for enhanced access to disease outbreak reports

Journal of Biomedical Informatics - Special issue: Sublanguage
Discriminative Category Matching: Efficient Text Classification for Huge Discriminative Category Matching: Efficient Text Classification for Huge

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
QXtract: a building block for efficient information extraction from text databases

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust query processing through progressive optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Contextualizing data warehouses with documents

Decision Support Systems
A modular information extraction system

Intelligent Data Analysis
Novelty and diversity in information retrieval evaluation

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys (CSUR)
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
It takes variety to make a world: diversification in recommender systems

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
The YAGO-NAGA approach to knowledge discovery

ACM SIGMOD Record
Optimizing SQL Queries over Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Join Optimization of Information Extraction Output: Quality Matters!

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Exploring a Few Good Tuples from Text Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Quality-driven query answering for integrated information systems

Quality-driven query answering for integrated information systems
SIE-OBI: a streaming information extraction platform for operational business intelligence

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Open information extraction using Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
FactRank: random walks on a web of facts

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Classification algorithms for relation prediction

ICDEW '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering Workshops

Business Intelligence and the Web

Information Systems Frontiers
INDREX: in-database distributional relation extraction

Proceedings of the sixteenth international workshop on Data warehousing and OLAP

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common task of Web users is querying structured information from Web pages. For realizing this interesting scenario we propose a novel query processor for systematically discovering instances of semantic relations in Web search results and joining these relation instances into complex result tuples with conjunctive queries. Our query processor transforms a structured user query into keyword queries that are submitted to a search engine, forwards search results to a relation extractor, and then combines relations into complex result tuples. The processor automatically learns discriminative and effective keywords for different types of semantic relations. Thereby, our query processor leverages the index of a search engine to query potentially billions of pages. Unfortunately, relation extractors may fail to return a relation for a result tuple. Moreover, user defined data sources may not return at least k complete result tuples. Therefore we propose an adaptive routing model based on information theory for retrieving missing attributes of incomplete result tuples. The model determines the most promising next incomplete tuple and attribute type for returning any-k complete result tuples at any point during the query execution process. We report a thorough experimental evaluation over multiple relation extractors. Our query processor returns complete result tuples while processing only very few Web pages.