Towards a theory of declarative knowledge
Foundations of deductive databases and logic programming
Logical foundations of object-oriented and frame-based languages
Journal of the ACM (JACM)
Managing semistructured data with florid: a deductive object-oriented perspective
Information Systems - Special issue on semistructured data
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
A brief survey of web data extraction tools
ACM SIGMOD Record
SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology
Visual Web Information Extraction with Lixto
Proceedings of the 27th International Conference on Very Large Data Bases
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Complexity and Expressive Power of Logic Programming
CCC '97 Proceedings of the 12th Annual IEEE Conference on Computational Complexity
Bootstrapping an ontology-based information extraction system
Intelligent exploration of the web
Web-scale information extraction in knowitall: (preliminary results)
Proceedings of the 13th international conference on World Wide Web
A survey of table recognition: Models, observations, transformations, and inferences
International Journal on Document Analysis and Recognition
The Lixto data extraction project: back and forth between theory and practice
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
IEEE Transactions on Knowledge and Data Engineering
Intelligent Text Extraction from PDF Documents
CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-2 (CIMCA-IAWTIC'06) - Volume 02
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Adaptive information extraction from text by rule induction and generalisation
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Ontology-based information extraction for business intelligence
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Ontology-driven information extraction with ontosyphon
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Text2Onto: a framework for ontology learning and data-driven change discovery
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Wrapping PDF documents exploiting uncertain knowledge
CAiSE'06 Proceedings of the 18th international conference on Advanced Information Systems Engineering
The lixto project: exploring new frontiers of web data extraction
BNCOD'06 Proceedings of the 23rd British National Conference on Databases, conference on Flexible and Efficient Information Handling
Hi-index | 0.00 |
Ontologies enable to directly encode domain knowledge in software applications, so ontology-based systems can exploit the meaning of information for providing advanced and intelligent functionalities. One of the most interesting and promising application of ontologies is information extraction from unstructured documents. In this area the extraction of meaningful information from PDF documents has been recently recognized as an important and challenging problem. This paper proposes an ontology-based information extraction system for PDF documents founded on a well suited knowledge representation approach named self-populating ontology (SPO ). The SPO approach combines object-oriented logic-based features with formal grammar capabilities and allows expressing knowledge in term of ontology schemas, instances, and extraction rules (called descriptors ) aimed at extracting information having also tabular form. The novel aspect of the SPO approach is that it allows to represent ontologies enriched by rules that enable them to populate them-self with instances extracted from unstructured PDF documents. In the paper the tractability of the SPO approach is proven. Moreover, features and behavior of the prototypical implementation of the SPO system are illustrated by means of a running example.