Toward general-purpose learning for information extraction

Authors:
Dayne Freitag
Affiliations:
Carnegie Mellon University, Pittsburgh, PA
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Year:
1998

Citing 9
Cited 26

Modeling and artificial intelligence

Applied Artificial Intelligence - Artificial Intelligence: Future, Impacts, Challenges—Part 2
Representation and learning in information retrieval

Representation and learning in information retrieval
WordNet: a lexical database for English

Communications of the ACM
Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Machine Learning

Machine Learning
Learning Logical Definitions from Relations

Machine Learning
Learning Text Analysis Rules for Domain-specific Natural Language Processing

Learning Text Analysis Rules for Domain-specific Natural Language Processing
Automatically generating extraction patterns from untagged text

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2

Information extraction from HTML: application of a general machine learning approach

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Learning dictionaries for information extraction by multi-level bootstrapping

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Learning for semantic interpretation: scaling up without dumbing down

Learning language in logic
Learning for text categorization and information extraction with ILP

Learning language in logic
Knowledge Discovery in SportsFinder: An Agent to Extract Sports Results from the Web

PAKDD '99 Proceedings of the Third Pacific-Asia Conference on Methodologies for Knowledge Discovery and Data Mining
Information extraction with automatic knowledge expansion

Information Processing and Management: an International Journal
Inducing information extraction systems for new languages via cross-language projection

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Mining knowledge from text using information extraction

ACM SIGKDD Explorations Newsletter - Natural language processing and text mining
Learning extraction patterns for subjective expressions

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Adaptive information extraction

ACM Computing Surveys (CSUR)
Boosting text segmentation via progressive classification

Knowledge and Information Systems
Negation recognition in medical narrative reports

Information Retrieval
Learning Recursive Patterns for Biomedical Information Extraction

Inductive Logic Programming
Acquiring paraphrases from text corpora

Proceedings of the fifth international conference on Knowledge capture
Exploiting subjectivity classification to improve information extraction

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
A unified model of phrasal and sentential evidence for information extraction

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
An integrated system of mining HTML texts and filtering structured documents

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Using ILP to construct features for information extraction from semi-structured text

ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Name entity recognition using inductive logic programming

Proceedings of the 2010 Symposium on Information and Communication Technology
Learning 5000 relational extractors

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Template-based information extraction without the templates

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Peeling back the layers: detecting event role fillers in secondary contexts

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An overview and classification of adaptive approaches to information extraction

Journal on Data Semantics IV
Turning the web into a database: extracting data and structure

NLDB'09 Proceedings of the 14th international conference on Applications of Natural Language to Information Systems
Bootstrapped training of event extraction classifiers

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Spanners: a formal framework for information extraction

Proceedings of the 32nd symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two trends are evident in the recent evolution of the field of information extraction: a preference for simple, often corpus-driven techniques over linguistically sophisticated ones; and a broadening of the central problem definition to include many non-traditional text domains. This development calls for information extraction systems which are as retargetable and general as possible. Here, we describe SRV, a learning architecture for information extraction which is designed for maximum generality and flexibility. SRV can exploit domain-specific information, including linguistic syntax and lexical information, in the form of features provided to the system explicitly as input for training. This process is illustrated using a domain created from Reuters corporate acquisitions articles. Features are derived from two general-purpose NLP systems, Sleator and Temperly's link grammar parser and Wordnet. Experiments compare the learner's performance with and without such linguistic information. Surprisingly, in many cases, the system performs as well without this information as with it.