A Comparative Study of Information Extraction Strategies

Authors:
Ronen Feldman;Yonatan Aumann;Michal Finkelstein-Landau;Eyal Hurvitz;Yizhar Regev;Ariel Yaroshevich
Affiliations:
-;-;-;-;-;-
Venue:
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2002

Citing 3
Cited 9

A framework for specifying explicit bias for revision of approximate information extraction rules

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Text Analysis Rules for Domain-specific Natural Language Processing

Learning Text Analysis Rules for Domain-specific Natural Language Processing
University of Manitoba: description of the PIE system used for MUC-6

MUC6 '95 Proceedings of the 6th conference on Message understanding

Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)

ACM SIGKDD Explorations Newsletter
TEG: a hybrid approach to information extraction

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Extracting Web Data Using Instance-Based Learning

World Wide Web
A modular information extraction system

Intelligent Data Analysis
Extracting Structured Data from Web Pages with Maximum Entropy Segmental Markov Model

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
NIMFA - Natural language Implicit Meaning Formalization and Abstraction

Expert Systems with Applications: An International Journal
On the complexity of regular-grammars with integer attributes

Journal of Computer and System Sciences
Extracting web data using instance-based learning

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
The HiLeX system for semantic information extraction

Transactions on Large-Scale Data- and Knowledge-Centered Systems V

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of online text documents exposes readers to a vast amount of potentially valuable knowledge buried therein. The sheer scale of material has created the pressing need for automated methods of discovering relevant information without having to read it all. Hence the growing interest in recent years in Text Mining.A common approach to Text Mining is Information Extraction (IE), extracting specific types (or templates) of information from a document collection. Although many works on IE have been published, researchers have not paid much attention to evaluate the contribution of syntactic and semantic analysis using Natural Language Processing (NLP) techniques to the quality of IE results.In this work we try to quantify the contribution of NLP techniques, by comparing three strategies for IE: na茂ve co-occurrence, ordered co-occurrence, and the structure-driven method - a rule-based strategy that relies on syntactic analysis followed by the extraction of suitable semantic templates. We use the three strategies for the extraction of two templates from financial news stories. We show that the structure-driven strategy provides significantly better precision results than the two other strategies (80-90% for the structure-driven compared with about only 60% for the co-occurrence and ordered co-occurrence). These results indicate that a syntactical and semantic analysis is necessary if one wishes to obtain high accuracy.