Cascading use of soft and hard matching pattern rules for weakly supervised information extraction
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Combining relations for information extraction from free text
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
The ability to extract desired pieces of information from natural language texts is an important task with a growing number of potential applications. This paper presents a novel pattern rule induction learning system, GRID, which emphasizes on utilizing global feature distribution in all of the training instances in order to make better decision on rule induction. GRID incorporates features at lexical, syntactical and semantic levels simultaneously. It induces rules by adopting a combination of top-down and bottom-up approaches. The features chosen in GRID are general and they were applied successfully to both semi-structured text and free text. Our experimental results on some publicly available webpage corpora and MUC-4 test set indicate that our approach is effective.