Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Information Retrieval
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Pattern-based disambiguation for natural language processing
EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Extracting meaningful entities from police narrative reports
dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Mining chat conversations for sex identification
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Web content mining for market intelligence acquiring from b2c websites
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Hi-index | 0.00 |
In this article we present a semi-supervised algorithm for pattern discovery in information extraction from textual data. The patterns that are discovered take the form of regular expressions that generate regular languages. We term our approach 'semi-supervised' because it requires significantly less effort to develop a training set than other approaches. From the training data our algorithm automatically generates regular expressions that can be used on previously unseen data for information extraction. Our experiments show that the algorithm has good testing performance on many features that are important in the fight against terrorism.