INDREX: in-database distributional relation extraction

Authors:
Torsten Kilias;Alexander Löser;Periklis Andritsos
Affiliations:
Technische Universität Berlin, Berlin, Germany;Technische Universität Berlin, Berlin, Germany;University of Toronto, Toronto, Canada
Venue:
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Year:
2013

Citing 25
Cited 1

Extending the database relational model to capture more meaning

ACM Transactions on Database Systems (TODS)
Maintaining knowledge about temporal intervals

Communications of the ACM
Understanding user goals in web search

Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
Exploratory search: from finding to understanding

Communications of the ACM - Supporting exploratory search
SystemT: a system for declarative information extraction

ACM SIGMOD Record
SOFIE: a self-organizing framework for information extraction

Proceedings of the 18th international conference on World wide web
Efficient Information Extraction over Evolving Text Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing SQL Queries over Text Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Column-oriented storage techniques for MapReduce

Proceedings of the VLDB Endowment
Score-consistent algebraic optimization of full-text search queries with GRAFT

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The SystemT IDE: an integrated development environment for information extraction rules

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incremental Information Extraction Using Relational Databases

IEEE Transactions on Knowledge and Data Engineering
Identifying relations for open information extraction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Just-in-time information extraction using extraction views

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data management with SAPs in-memory computing engine

Proceedings of the 15th International Conference on Extending Database Technology
Open information extraction: the second generation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Using search-logs to improve query tagging

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
PATTY: a taxonomy of relational patterns with semantic types

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
KrakeN: N-ary facts in open information extraction

AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Active learning for relation type extension with local and global data views

Proceedings of the 21st ACM international conference on Information and knowledge management
ClausIE: clause-based open information extraction

Proceedings of the 22nd international conference on World Wide Web
Beyond search: Retrieving complete tuples from a text-database

Information Systems Frontiers

DOLAP 2013 workshop summary

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.