Optimization of queries with user-defined predicates
ACM Transactions on Database Systems (TODS)
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
A brief survey of web data extraction tools
ACM SIGMOD Record
The VLDB Journal — The International Journal on Very Large Data Bases - Prototypes of deductive database systems
Main Memory Database Systems: An Overview
IEEE Transactions on Knowledge and Data Engineering
The Volcano Optimizer Generator: Extensibility and Efficient Search
Proceedings of the Ninth International Conference on Data Engineering
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Selection conditions in main memory
ACM Transactions on Database Systems (TODS)
The deductive database system ℒ𝒟ℒ++
Theory and Practice of Logic Programming
Natural Language Engineering
Extracting relational data from HTML repositories
ACM SIGKDD Explorations Newsletter
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Diagnosis of asynchronous discrete event systems: datalog to the rescue!
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient Batch Top-k Search for Dictionary-based Entity Recognition
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Declarative networking: language, execution and optimization
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
To search or to crawl?: towards a query optimizer for text-centric tasks
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Managing information extraction: state of the art and research directions
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A fast and robust method for web page template detection and removal
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Using datalog with binary decision diagrams for program analysis
APLAS'05 Proceedings of the Third Asian conference on Programming Languages and Systems
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Toward best-effort information extraction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the provenance of non-answers to queries over extracted data
Proceedings of the VLDB Endowment
Evita raced: metacompilation for declarative networks
Proceedings of the VLDB Endowment
Harvesting, searching, and ranking knowledge on the web: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Database and information-retrieval methods for knowledge discovery
Communications of the ACM - A Direct Path to Dependable Software
Foundations and Trends in Databases
SystemT: a system for declarative information extraction
ACM SIGMOD Record
Information extraction challenges in managing unstructured data
ACM SIGMOD Record
Purple SOX extraction management system
ACM SIGMOD Record
Building query optimizers for information extraction: the SQoUT project
ACM SIGMOD Record
The YAGO-NAGA approach to knowledge discovery
ACM SIGMOD Record
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Uncertainty management in rule-based information extraction systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing complex extraction programs over evolving text data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Precise complexity analysis for efficient datalog queries
Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Find your advisor: robust knowledge gathering from the web
Procceedings of the 13th International Workshop on the Web and Databases
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Is formalizing events necessary for full exploitation
ESAIR '10 Proceedings of the third workshop on Exploiting semantic annotations in information retrieval
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
Querying probabilistic information extraction
Proceedings of the VLDB Endowment
Scalable knowledge harvesting with high precision and high recall
Proceedings of the fourth ACM international conference on Web search and data mining
Taking the OXPath down the deep web
Proceedings of the 14th International Conference on Extending Database Technology
OXPath: little language, little memory, great value
Proceedings of the 20th international conference companion on World wide web
Hybrid in-database inference for declarative information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
More efficient datalog queries: subsumptive tabling beats magic sets
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Datalog and emerging applications: an interactive tutorial
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A descriptive approach to classification
ICTIR'11 Proceedings of the Third international conference on Advances in information retrieval theory
Automatic extraction rules generation based on XPath pattern learning
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Chapter 6: web data extraction for service creation
Search Computing
Intelligent crawling of web applications for web archiving
Proceedings of the 21st international conference companion on World Wide Web
Proceedings of the 21st international conference companion on World Wide Web
Just-in-time information extraction using extraction views
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Malleability-Aware skyline computation on linked open data
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
A PROV encoding for provenance analysis using deductive rules
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
OXPath: A language for scalable data extraction, automation, and crawling on the deep web
The VLDB Journal — The International Journal on Very Large Data Bases
Learning to predict from textual data
Journal of Artificial Intelligence Research
Selectivity estimation for hybrid queries over text-rich data graphs
Proceedings of the 16th International Conference on Extending Database Technology
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
Proceedings of the 16th International Conference on Extending Database Technology
GAT: Platform for automatic context-aware mobile services for m-tourism
Expert Systems with Applications: An International Journal
Provenance-based dictionary refinement in information extraction
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Automated crime report analysis and classification for e-government and decision support
Proceedings of the 14th Annual International Conference on Digital Government Research
Concept adjustment for description logics
Proceedings of the seventh international conference on Knowledge capture
Discovering influential authors in heterogeneous academic networks by a co-ranking method
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Information extraction as a filtering task
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
On the modelling of ranking algorithms in probabilistic datalog
Proceedings of the 7th International Workshop on Ranking in Databases
When speed has a price: fast information extraction using approximate algorithms
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In this paper we argue that developing information extraction (IE) programs using Datalog with embedded procedural extraction predicates is a good way to proceed. First, compared to current ad-hoc composition using, e.g., Perl or C++, Datalog provides a cleaner and more powerful way to compose small extraction modules into larger programs. Thus, writing IE programs this way retains and enhances the important advantages of current approaches: programs are easy to understand, debug, and modify. Second, once we write IE programs in this framework, we can apply query optimization techniques to them. This gives programs that, when run over a variety of data sets, are more efficient than any monolithic program because they are optimized based on the statistics of the data on which they are invoked. We show how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework, then provide initial solutions. Extensive experiments over real-world data demonstrate that optimization is indeed vital for IE programs and that we can effectively optimize IE programs written in this proposed framework.