Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Two dimensional generalization in information extraction
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Integrating Unstructured Data into Relational Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Adaptive information extraction
ACM Computing Surveys (CSUR)
NAGA: harvesting, searching and ranking knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Foundations and Trends in Databases
Algorithms for learning regular expressions from positive data
Information and Computation
High-performance information extraction with AliBaba
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A context pattern induction method for named entity extraction
CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Inference of concise regular expressions and DTDs
ACM Transactions on Database Systems (TODS)
WizIE: a best practices guided development environment for information extraction
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Automatic string replace by examples
Proceedings of the 15th annual conference on Genetic and evolutionary computation
I can do text analytics!: designing development tools for novice developers
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Hi-index | 0.00 |
Regular expressions are the dominant technique to extract business relevant entities (e.g., invoice numbers or product names) from text data (e.g., invoices), since these entity types often follow a strict underlying syntactical pattern. However, the manual construction of regular expressions that guarantee a high recall and precision is a tedious manual task and requires expert knowledge. In this paper, we propose an approach that automatically infers regular expressions from a set of (positive) sample entities, which in turn can be derived either from enterprise databases (e.g., a product catalog) or annotated documents (e.g., historical invoices). The main innovation of our approach is that it learns effective regular expressions that can be easily interpreted and modified by a user. The effectiveness is obtained by a novel method that weights dependent entity features of different granularity (i.e. on character and token level) against each other and selects the most suitable ones to form a regular expression.