A maximum entropy approach to natural language processing
Computational Linguistics
Towards automating of document structure transformations
Proceedings of the 2002 ACM symposium on Document engineering
Recursive X-Y cut using bounding boxes of connected components
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Reverse Engineering for Web Data: From Visual to Semantic Structures
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Text-mining based journal splitting
ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Document annotation by active learning techniques
Proceedings of the 2006 ACM symposium on Document engineering
ALDAI: active learning documents annotation interface
Proceedings of the 2006 ACM symposium on Document engineering
Hi-index | 0.00 |
We present an integrated framework for the document conversion from legacy formats to XML format. We describe the LegDoC project, aimed at automating the conversion of layout annotations layout-oriented formats like PDF, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We use a real case conversion project as a driving example to exemplify different techniques implemented in the project.