SystemT: a system for declarative information extraction
ACM SIGMOD Record
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Enterprise information extraction: recent developments and open challenges
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: an algebraic approach to declarative information extraction
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
Facilitating pattern discovery for relation extraction with semantic-signature-based clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Surfacing time-critical insights from social media
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
INDREX: in-database distributional relation extraction
Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Hi-index | 0.00 |
Information Extraction (IE)-the problem of extracting structured information from unstructured text - has become the key enabler for many enterprise applications such as semantic search, business analytics and regulatory compliance. While rule-based IE systems are widely used in practice due to their well-known "explainability," developing high-quality information extraction rules is known to be a labor-intensive and time-consuming iterative process. Our demonstration showcases SystemT IDE, the integrated development environment for SystemT, a state-of-the-art rule-based IE system from IBMResearch that has been successfully embedded in multiple IBM enterprise products. SystemT IDE facilitates the development, test and analysis of high-quality IE rules by means of sophisticated techniques, ranging from data management to machine learning. We show how to build high-quality IE annotators using a suite of tools provided by SystemT IDE, including computing data provenance, learning basic features such as regular expressions and dictionaries, and automatically refining rules based on labeled examples.