A framework for specifying explicit bias for revision of approximate information extraction rules
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Text Analysis Rules for Domain-specific Natural Language Processing
Learning Text Analysis Rules for Domain-specific Natural Language Processing
University of Manitoba: description of the PIE system used for MUC-6
MUC6 '95 Proceedings of the 6th conference on Message understanding
Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)
ACM SIGKDD Explorations Newsletter
TEG: a hybrid approach to information extraction
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Extracting Web Data Using Instance-Based Learning
World Wide Web
A modular information extraction system
Intelligent Data Analysis
Extracting Structured Data from Web Pages with Maximum Entropy Segmental Markov Model
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
NIMFA - Natural language Implicit Meaning Formalization and Abstraction
Expert Systems with Applications: An International Journal
On the complexity of regular-grammars with integer attributes
Journal of Computer and System Sciences
Extracting web data using instance-based learning
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
The HiLeX system for semantic information extraction
Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Hi-index | 0.00 |
The availability of online text documents exposes readers to a vast amount of potentially valuable knowledge buried therein. The sheer scale of material has created the pressing need for automated methods of discovering relevant information without having to read it all. Hence the growing interest in recent years in Text Mining.A common approach to Text Mining is Information Extraction (IE), extracting specific types (or templates) of information from a document collection. Although many works on IE have been published, researchers have not paid much attention to evaluate the contribution of syntactic and semantic analysis using Natural Language Processing (NLP) techniques to the quality of IE results.In this work we try to quantify the contribution of NLP techniques, by comparing three strategies for IE: na茂ve co-occurrence, ordered co-occurrence, and the structure-driven method - a rule-based strategy that relies on syntactic analysis followed by the extraction of suitable semantic templates. We use the three strategies for the extraction of two templates from financial news stories. We show that the structure-driven strategy provides significantly better precision results than the two other strategies (80-90% for the structure-driven compared with about only 60% for the co-occurrence and ordered co-occurrence). These results indicate that a syntactical and semantic analysis is necessary if one wishes to obtain high accuracy.