Active Hidden Markov Models for Information Extraction

Authors:
Tobias Scheffer;Christian Decomain;Stefan Wrobel
Affiliations:
-;-;-
Venue:
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Year:
2001

Citing 9
Cited 31

Active data clustering

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
Wrapper induction: efficiency and expressiveness

Artificial Intelligence - Special issue on Intelligent internet systems
Learning to construct knowledge bases from the World Wide Web

Artificial Intelligence - Special issue on Intelligent internet systems
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cascaded Markov Models

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Active learning with statistical models

Journal of Artificial Intelligence Research

Clipping and Analyzing News Using Machine Learning Techniques

DS '01 Proceedings of the 4th International Conference on Discovery Science
Learning to extract information from semi-structured text using a discriminative context free grammar

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Active learning for Hidden Markov Models: objective functions and algorithms

ICML '05 Proceedings of the 22nd international conference on Machine learning
Improving name tagging by reference resolution and relation detection

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Corrective feedback and persistent learning for information extraction

Artificial Intelligence
Efficient sampling of training set in large and noisy multimedia data

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Mining in Large Noisy Domains

Journal of Data and Information Quality (JDIQ)
Interactive information extraction with constrained conditional random fields

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Confidence estimation for information extraction

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Reducing labeling effort for structured prediction tasks

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Data selection in semi-supervised learning for name tagging

IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Corrective feedback and persistent learning for information extraction

Artificial Intelligence
On privacy preservation in text and document-based active learning for named entity recognition

Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
Application Study of Hidden Markov Model and Maximum Entropy in Text Information Extraction

AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
A discriminative model corresponding to hierarchical HMMs

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Scientific literature metadata extraction based on HMM

CDVE'09 Proceedings of the 6th international conference on Cooperative design, visualization, and engineering
Active learning-based elicitation for semi-supervised word alignment

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
HMM machine learning and inference for Activities of Daily Living recognition

The Journal of Supercomputing
Confidence in structured-prediction using confidence-weighted models

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Chinese speech recognition based on a hybrid SVM and HMM architecture

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Active learning with multiple annotations for comparable data classification task

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Learning from partially annotated sequences

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
An overview and classification of adaptive approaches to information extraction

Journal on Data Semantics IV
A weakly-supervised approach to argumentative zoning of scientific documents

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Combining Uncertainty Sampling methods for supporting the generation of meta-examples

Information Sciences: an International Journal
Hierarchical hidden conditional random fields for information extraction

LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Reverse active learning for optimising information extraction training production

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Feedback-driven multiclass active learning for data streams

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Active learning for networked data based on non-progressive diffusion model

Proceedings of the 7th ACM international conference on Web search and data mining
Pattern classification and clustering: A review of partially supervised learning approaches

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction from HTML documents requires a classifier capable of assigning semantic labels to the words or word sequences to be extracted. If completely labeled documents are available for training, well-known Markov model techniques can be used to learn such classifiers. In this paper, we consider the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training. We first give detailed account of the task and its appropriate loss function, and show how it can be minimized given an HMM. We describe an EM style algorithm for learning HMMs from partially labeled data. We then present an active learning algorithm that selects "difficult" unlabeled tokens and asks the user to label them. We study empirically by how much active learning reduces the required data labeling effort, or increases the quality of the learned model achievable with a given amount of user effort.