Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Learning Regular Languages from Simple Positive Examples
Machine Learning
ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning regular languages using RFSAs
Theoretical Computer Science - Special issue: Algorithmic learning theory
Journal of the American Society for Information Science and Technology - Intelligence and Security Informatics
Pattern-based disambiguation for natural language processing
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Named entity recognition with character-level models
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Getting work done on the web: supporting transactional queries
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Inference of concise DTDs from XML data
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
An effective two-stage model for exploiting non-local dependencies in named entity recognition
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Navigating the intranet with high precision
Proceedings of the 16th international conference on World Wide Web
Empirical study on the performance stability of named entity recognition model across domains
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning to understand web site update requests
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Self-supervised relation extraction from the web
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Algorithms for learning regular expressions
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Learning regular expressions from noisy sequences
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Domain adaptation of rule-based annotators for named-entity recognition tasks
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
AND '10 Proceedings of the fourth workshop on Analytics for noisy unstructured text data
Automatic rule refinement for information extraction
Proceedings of the VLDB Endowment
The SystemT IDE: an integrated development environment for information extraction rules
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient schema extraction from a large collection of XML documents
Proceedings of the 49th Annual Southeast Regional Conference
Enabling information extraction by inference of regular expressions from sample entities
Proceedings of the 20th ACM international conference on Information and knowledge management
Automatic generation of regular expressions from examples with genetic programming
Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
WizIE: a best practices guided development environment for information extraction
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Towards efficient named-entity rule induction for customizability
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Improving recall of regular expressions for information extraction
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Automatic string replace by examples
Proceedings of the 15th annual conference on Genetic and evolutionary computation
I can do text analytics!: designing development tools for novice developers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning regular expressions to template-based FAQ retrieval systems
Knowledge-Based Systems
Hi-index | 0.00 |
Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose ReLIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE.