A subquadratic algorithm for approximate regular expression matching
Journal of Algorithms
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Machine Learning for Information Extraction in Informal Domains
Machine Learning - Special issue on information retrieval
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
What Is the Search Space of the Regular Inference?
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Journal of the American Society for Information Science and Technology - Intelligence and Security Informatics
Pattern-based disambiguation for natural language processing
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
An effective two-stage model for exploiting non-local dependencies in named entity recognition
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Extracting personal names from email: applying named entity recognition to informal text
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Rule based synonyms for entity extraction from noisy text
Proceedings of the second workshop on Analytics for noisy unstructured text data
Opinion mining from noisy text data
Proceedings of the second workshop on Analytics for noisy unstructured text data
An Algebraic Approach to Rule-Based Information Extraction
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Regular expression learning for information extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Meta-level information extraction
KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Improving recall of regular expressions for information extraction
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Automatic string replace by examples
Proceedings of the 15th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
Regular Expressions have been used for Information Extraction tasks in a variety of domains. The alphabet of the regular expression can either be the relevant tokens corresponding to the entity of interest or individual characters in which case the alphabet size becomes very large. The presence of noise in unstructured text documents along with increased alphabet size of the regular expressions poses a significant challenge for entity extraction tasks, and also for algorithmically learning complex regular expressions. In this paper, we present a novel algorithm for regular expression learning which clusters similar matches to obtain the corresponding regular expressions, identifies and eliminates noisy clusters, and finally uses weighted disjunction of the most promising candidate regular expressions to obtain the final expression. The experimental results demonstrate high value of both precision and recall of this final expression, which reinforces the applicability of our approach in entity extraction tasks of practical importance.