Extending the database relational model to capture more meaning
ACM Transactions on Database Systems (TODS)
Maintaining knowledge about temporal intervals
Communications of the ACM
Understanding user goals in web search
Proceedings of the 13th international conference on World Wide Web
Automatic acquisition of hyponyms from large text corpora
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Natural Language Engineering
Exploratory search: from finding to understanding
Communications of the ACM - Supporting exploratory search
SystemT: a system for declarative information extraction
ACM SIGMOD Record
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Efficient Information Extraction over Evolving Text Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing SQL Queries over Text Databases
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Score-consistent algebraic optimization of full-text search queries with GRAFT
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The SystemT IDE: an integrated development environment for information extraction rules
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incremental Information Extraction Using Relational Databases
IEEE Transactions on Knowledge and Data Engineering
Identifying relations for open information extraction
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Just-in-time information extraction using extraction views
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Data management with SAPs in-memory computing engine
Proceedings of the 15th International Conference on Extending Database Technology
Open information extraction: the second generation
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
Using search-logs to improve query tagging
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
PATTY: a taxonomy of relational patterns with semantic types
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
KrakeN: N-ary facts in open information extraction
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Active learning for relation type extension with local and global data views
Proceedings of the 21st ACM international conference on Information and knowledge management
ClausIE: clause-based open information extraction
Proceedings of the 22nd international conference on World Wide Web
Beyond search: Retrieving complete tuples from a text-database
Information Systems Frontiers
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
Relation extraction transforms the textual representation of a relationship into the relational model of a data warehouse. Early systems, such as SystemT by IBM or the open source system GATE solve this task with handcrafted rule sets that the system executes document-by-document. Thereby the user must execute a highly interactive and iterative process of reading a document, of expressing rules, of testing these rules on the next document and of refining rules. Until now, these systems do neither leverage the full potential of built-in declarative query languages nor the indexing and query optimization techniques of a modern RDBMS that would enable a user interactive rule refinement across documents and on the entire corpus. We propose the INDREX system that enables a user for the first time to describe corpus-wide extraction tasks in a declarative language and permits the user to run interactive rule refinement queries. For enabling this powerful functionality we extend a standard PostgreSQL with a set of white-box user-defined functions that enable corpus-wide transformations from sentences into relationships. We store the text corpus and rules in the same RDBMS that already holds domain specific structured data. As a result, (1) the user can leverage this data to further adapt rules to the target domain, (2) the user does not need an additional system for rule extraction and (3) the INDREX system can leverage the full power of built-in indexing and query optimization techniques of the underlaying RDBMS. In a preliminary study we report on the feasibility of this disruptive approach and show multiple queries in INDREX on the Reuters Corpus, Volume 1.