Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Class-based n-gram models of natural language
Computational Linguistics
A maximum entropy approach to natural language processing
Computational Linguistics
Recent advances of grammatical inference
Theoretical Computer Science - Special issue on algorithmic learning theory
Information extraction from HTML: application of a general machine learning approach
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Typechecking for XML transformers
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
DTD inference for views of XML data
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A classifier for semi-structured documents
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
The Theory of Parsing, Translation, and Compiling
The Theory of Parsing, Translation, and Compiling
Towards automating of document structure transformations
Proceedings of the 2002 ACM symposium on Document engineering
Automata theory for XML researchers
ACM SIGMOD Record
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Machine Learning for Sequential Data: A Review
Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
XRules: an effective structural classifier for XML data
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic Model for Structured Document Mapping
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
LAG: Achieving transparent access to legacy data by leveraging grid environment
Future Generation Computer Systems
Evolution of XPath lists for document data selection
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
From layout to semantic: a reranking model for mapping web documents to mediated XML representations
Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Hi-index | 0.00 |
We consider the problem of document conversion from the rendering-oriented HTML markup into a semantic-oriented XML annotation defined by user-specific DTDs or XML Schema descriptions. We represent both source and target documents as rooted ordered trees so the conversion can be achieved by applying a set of tree transformations. We apply the supervised learning framework to the conversion task according to which the tree transformations are learned from a set of training examples. %Because of the complexity of tree-to-tree transformations, We develop a two-step approach to the conversion problem, that first labels leaves in the source trees and then recomposes target trees from the leaf labels. We present two solutions based of the leaf classification with the target terminals and paths. Moreover, we develop three methods for the leaf classification. All methods and solutions have been tested on two real collections.