Harvesting, searching, and ranking knowledge on the web: invited talk
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Foundations and Trends in Databases
SystemT: a system for declarative information extraction
ACM SIGMOD Record
Building query optimizers for information extraction: the SQoUT project
ACM SIGMOD Record
The YAGO-NAGA approach to knowledge discovery
ACM SIGMOD Record
SOFIE: a self-organizing framework for information extraction
Proceedings of the 18th international conference on World wide web
Efficiently incorporating user feedback into information extraction and integration programs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Uncertainty management in rule-based information extraction systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimizing complex extraction programs over evolving text data
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Enabling enterprise mashups over unstructured text feeds with InfoSphere MashupHub and SystemT
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
RankIE: document retrieval on ranked entity graphs
Proceedings of the VLDB Endowment
Data-oriented content query system: searching for data into text on the web
Proceedings of the third ACM international conference on Web search and data mining
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Understanding queries in a search database system
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Find your advisor: robust knowledge gathering from the web
Procceedings of the 13th International Workshop on the Web and Databases
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
Querying probabilistic information extraction
Proceedings of the VLDB Endowment
Scalable knowledge harvesting with high precision and high recall
Proceedings of the fourth ACM international conference on Web search and data mining
Rewrite rules for search database systems
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hybrid in-database inference for declarative information extraction
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The SystemT IDE: an integrated development environment for information extraction rules
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
SystemT: a declarative information extraction system
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Systems Demonstrations
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Just-in-time information extraction using extraction views
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards efficient named-entity rule induction for customizability
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
A performance comparison of parallel DBMSs and MapReduce on large-scale text analytics
Proceedings of the 16th International Conference on Extending Database Technology
Spanners: a formal framework for information extraction
Proceedings of the 32nd symposium on Principles of database systems
Provenance-based dictionary refinement in information extraction
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Concept adjustment for description logics
Proceedings of the seventh international conference on Knowledge capture
Efficient parsing-based search over structured data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
When speed has a price: fast information extraction using approximate algorithms
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars. However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules. Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization. The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets. We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators. Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data.