Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs
IEEE Transactions on Pattern Analysis and Machine Intelligence - Graph Algorithms and Computer Vision
Structural Matching in Computer Vision Using Probabilistic Relaxation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
The lixto project: exploring new frontiers of web data extraction
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Hi-index | 0.00 |
Wrapping is the process of navigating a data source, semi-automatically extracting data and transforming it into a form suitable for data processing applications. There are currently a number of established products on the market for wrapping data from web pages. One such approach is Lixto [1], a product of research performed at our institute.Our work is concerned with extending the wrapping functionality of Lixto to PDF documents. As the PDF format is relatively unstructured, this is a challenging task. We have developed a method to segment the page into blocks, which are represented as nodes in a relational graph. This paper describes our current research in the use of relational matching techniques on this graph to locate wrapping instances.