Incremental information extraction using tree-based context representations

Authors:
Christian Siefkes
Affiliations:
Berlin-Brandenburg Graduate School in Distributed Information Systems, Database and Information Systems Group, Freie Universität Berlin, Berlin, Germany
Venue:
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Year:
2005

Citing 8
Cited 3

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Machine Learning
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A maximum entropy approach to information extraction from semi-structured and free text

Eighteenth national conference on Artificial intelligence
Combining winnow and orthogonal sparse bigrams for incremental spam filtering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Bayesian information extraction network

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Relational learning via propositional algorithms: an information extraction case study

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
CRYSTAL inducing a conceptual dictionary

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Uncertainty management in rule-based information extraction systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A comparison of tagging strategies for statistical information extraction

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
An overview and classification of adaptive approaches to information extraction

Journal on Data Semantics IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic processing. Providing annotated training data to adapt a trainable IE system to a new domain requires a considerable amount of work. To address this, we explore incremental learning. Here training documents are annotated sequentially by a user and immediately incorporated into the extraction model. Thus the system can support the user by proposing extractions based on the current extraction model, reducing the workload of the user over time. We introduce an approach to modeling IE as a token classification task that allows incremental training. To provide sufficient information to the token classifiers, we use rich, tree-based context representations of each token as feature vectors. These representations make use of the heuristically deduced document structure in addition to linguistic and semantic information. We consider the resulting feature vectors as ordered and combine proximate features into more expressive joint features, called “Orthogonal Sparse Bigrams” (OSB). Our results indicate that this setup makes it possible to employ IE in an incremental fashion without a serious performance penalty.