NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Generating finite-state transducers for semi-structured data extraction from the Web
Information Systems - Special issue on semistructured data
The Hierarchical Hidden Markov Model: Analysis and Applications
Machine Learning
Wrapper induction: efficiency and expressiveness
Artificial Intelligence - Special issue on Intelligent internet systems
Learning to construct knowledge bases from the World Wide Web
Artificial Intelligence - Special issue on Intelligent internet systems
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Message Understanding Conference-6: a brief history
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Active learning with statistical models
Journal of Artificial Intelligence Research
Clipping and Analyzing News Using Machine Learning Techniques
DS '01 Proceedings of the 4th International Conference on Discovery Science
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Active learning for Hidden Markov Models: objective functions and algorithms
ICML '05 Proceedings of the 22nd international conference on Machine learning
Improving name tagging by reference resolution and relation detection
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Corrective feedback and persistent learning for information extraction
Artificial Intelligence
Efficient sampling of training set in large and noisy multimedia data
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Journal of Data and Information Quality (JDIQ)
Interactive information extraction with constrained conditional random fields
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
An analysis of active learning strategies for sequence labeling tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Confidence estimation for information extraction
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Reducing labeling effort for structured prediction tasks
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Data selection in semi-supervised learning for name tagging
IEBeyondDoc '06 Proceedings of the Workshop on Information Extraction Beyond The Document
Corrective feedback and persistent learning for information extraction
Artificial Intelligence
On privacy preservation in text and document-based active learning for named entity recognition
Proceedings of the ACM first international workshop on Privacy and anonymity for very large databases
Application Study of Hidden Markov Model and Maximum Entropy in Text Information Extraction
AICI '09 Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence
A discriminative model corresponding to hierarchical HMMs
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Scientific literature metadata extraction based on HMM
CDVE'09 Proceedings of the 6th international conference on Cooperative design, visualization, and engineering
Active learning-based elicitation for semi-supervised word alignment
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
HMM machine learning and inference for Activities of Daily Living recognition
The Journal of Supercomputing
Confidence in structured-prediction using confidence-weighted models
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Chinese speech recognition based on a hybrid SVM and HMM architecture
ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part III
Active learning with multiple annotations for comparable data classification task
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Learning from partially annotated sequences
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
An overview and classification of adaptive approaches to information extraction
Journal on Data Semantics IV
A weakly-supervised approach to argumentative zoning of scientific documents
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Combining Uncertainty Sampling methods for supporting the generation of meta-examples
Information Sciences: an International Journal
Hierarchical hidden conditional random fields for information extraction
LION'05 Proceedings of the 5th international conference on Learning and Intelligent Optimization
Reverse active learning for optimising information extraction training production
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Feedback-driven multiclass active learning for data streams
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Active learning for networked data based on non-progressive diffusion model
Proceedings of the 7th ACM international conference on Web search and data mining
Pattern classification and clustering: A review of partially supervised learning approaches
Pattern Recognition Letters
Hi-index | 0.00 |
Information extraction from HTML documents requires a classifier capable of assigning semantic labels to the words or word sequences to be extracted. If completely labeled documents are available for training, well-known Markov model techniques can be used to learn such classifiers. In this paper, we consider the more challenging task of learning hidden Markov models (HMMs) when only partially (sparsely) labeled documents are available for training. We first give detailed account of the task and its appropriate loss function, and show how it can be minimized given an HMM. We describe an EM style algorithm for learning HMMs from partially labeled data. We then present an active learning algorithm that selects "difficult" unlabeled tokens and asks the user to label them. We study empirically by how much active learning reduces the required data labeling effort, or increases the quality of the learned model achievable with a given amount of user effort.