Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Information Extraction with HMM Structures Learned by Stochastic Optimization
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A maximum entropy approach to information extraction from semi-structured and free text
Eighteenth national conference on Artificial intelligence
Combining winnow and orthogonal sparse bigrams for incremental spam filtering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Bayesian information extraction network
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Relational learning via propositional algorithms: an information extraction case study
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
CRYSTAL inducing a conceptual dictionary
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Uncertainty management in rule-based information extraction systems
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A comparison of tagging strategies for statistical information extraction
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
An overview and classification of adaptive approaches to information extraction
Journal on Data Semantics IV
Hi-index | 0.00 |
The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic processing. Providing annotated training data to adapt a trainable IE system to a new domain requires a considerable amount of work. To address this, we explore incremental learning. Here training documents are annotated sequentially by a user and immediately incorporated into the extraction model. Thus the system can support the user by proposing extractions based on the current extraction model, reducing the workload of the user over time. We introduce an approach to modeling IE as a token classification task that allows incremental training. To provide sufficient information to the token classifiers, we use rich, tree-based context representations of each token as feature vectors. These representations make use of the heuristically deduced document structure in addition to linguistic and semantic information. We consider the resulting feature vectors as ordered and combine proximate features into more expressive joint features, called “Orthogonal Sparse Bigrams” (OSB). Our results indicate that this setup makes it possible to employ IE in an incremental fashion without a serious performance penalty.