A maximum entropy approach to natural language processing
Computational Linguistics
DTD inference for views of XML data
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Automata theory for XML researchers
ACM SIGMOD Record
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
A comparison of algorithms for maximum entropy parameter estimation
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Hierarchical hidden Markov models for information extraction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Probabilistic Model for Structured Document Mapping
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Applications of Reinforcement Learning to Structured Prediction
Recent Advances in Reinforcement Learning
Structured prediction with reinforcement learning
Machine Learning
From layout to semantic: a reranking model for mapping web documents to mediated XML representations
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
We consider the problem of semantic annotation of semi-structured documents according to a target XML schema. The task is to annotate a document in a tree-like manner where the annotation tree is an instance of a tree class defined by DTD or W3C XML Schema descriptions. In the probabilistic setting, we cope with the tree annotation problem as a generalized probabilistic context-free parsing of an observation sequence where each observation comes with a probability distribution over terminals supplied by a probabilistic classifier associated with the content of documents. We determine the most probable tree annotation by maximizing the joint probability of selecting a terminal sequence for the observation sequence and the most probable parse for the selected terminal sequence.