A modular information extraction system

Authors:
Ronen Feldman;Yizhar Regev;Maya Gorodetsky
Affiliations:
(Correspd. ronen.feldman@huji.ac.il) Information Systems Department, School of Business Administration, Hebrew University of Jerusalem, Jerusalem 91905, Israel;Information Systems Department, School of Business Administration, Hebrew University of Jerusalem, Jerusalem 91905, Israel;Information Systems Department, School of Business Administration, Hebrew University of Jerusalem, Jerusalem 91905, Israel
Venue:
Intelligent Data Analysis
Year:
2008

Citing 19
Cited 2

An algorithm for pronominal anaphora resolution

Computational Linguistics
Information extraction

Communications of the ACM
Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Information Extraction: Techniques and Challenges

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
A Comparative Study of Information Extraction Strategies

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Information Extraction as a Core Language Technology

SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Rule-based extraction of experimental evidence in the biomedical domain: the KDD Cup 2002 (task 1)

ACM SIGKDD Explorations Newsletter
An architecture for anaphora resolution

ANLC '88 Proceedings of the second conference on Applied natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Evaluating a focus-based approach to anaphora resolution

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Evaluating automated and manual acquisition of anaphora resolution strategies

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Anaphora for everyone: pronominal anaphora resoluation without a parser

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Evaluation tool for rule-based anaphora resolution methods

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Event coreference for information extraction

ANARESOLUTION '97 Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts
CRYSTAL inducing a conceptual dictionary

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Using support vector machines for terrorism information extraction

ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics

Self-supervised web search for any-k complete tuples

Proceedings of the 2nd International Workshop on Business intelligencE and the WEB
Beyond search: Retrieving complete tuples from a text-database

Information Systems Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In today's information age, the amount of text documents available electronically (on the Web, on corporate intranets, on news wires and elsewhere) is overwhelming. Search engines and information retrieval, while useful to find documents that satisfy a certain query, offer little help with analyzing the unstructured documents themselves. Text Mining is the automated process of analyzing unstructured, natural language text in order to discover information and knowledge that are difficult to retrieve. Information Extraction (IE) centers on finding entities and relations in free text and provides a solid foundation for text mining. In this paper we present a modular IE system, based on the DIAL language. DIAL allows users to implement IE solutions for various domains rapidly, based on a common Natural Language Processing (NLP) infrastructure. We demonstrate in detail an implementation of a system for extracting relations in the intelligence news domain. We present an evaluation of our system and discuss enhancements for other domains, such as emails.