Transforming arbitrary tables into logical form with TARTAR

Authors:
Aleksander Pivk;Philipp Cimiano;York Sure;Matjaz Gams;Vladislav Rajkovič;Rudi Studer
Affiliations:
Jozef Stefan Institute, Department of Intelligent Systems, Jamova 39, 1000 Ljubljana, Slovenia and Institute AIFB, University of Karlsruhe, Karlsruhe, Germany;Institute AIFB, University of Karlsruhe, Karlsruhe, Germany;Institute AIFB, University of Karlsruhe, Karlsruhe, Germany;Jozef Stefan Institute, Department of Intelligent Systems, Jamova 39, 1000 Ljubljana, Slovenia;Faculty of Organizational Sciences, University of Maribor, Kranj, Slovenia and Jozef Stefan Institute, Department of Intelligent Systems, Jamova 39, 1000 Ljubljana, Slovenia;Institute AIFB, University of Karlsruhe, Karlsruhe, Germany and Research Center for Information Technologies (FZI), Karlsruhe, Germany
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 26
Cited 15

The well-founded semantics for general logic programs

Journal of the ACM (JACM)
Graphs and tables: a four-factor experiment

Communications of the ACM
Logical foundations of object-oriented and frame-based languages

Journal of the ACM (JACM)
Semantic search on Internet tabular information extraction for answering queries

Proceedings of the ninth international conference on Information and knowledge management
A relational model of data for large shared data banks

Communications of the ACM
A flexible learning system for wrapping tables and lists in HTML documents

Proceedings of the 11th international conference on World Wide Web
A machine learning based approach for table detection on the web

Proceedings of the 11th international conference on World Wide Web
QuASM: a system for question answering using semi-structured data

Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Mining the Web: Discovering Knowledge from HyperText Data

Mining the Web: Discovering Knowledge from HyperText Data
Maximum Entropy Markov Models for Information Extraction and Segmentation

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure

ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
Ontobroker: Ontology Based Access to Distributed and Semi-Structured Information

DS-8 Proceedings of the IFIP TC2/WG2.6 Eighth Working Conference on Database Semantics- Semantic Issues in Multimedia Systems
Detecting Tables in HTML Documents

DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Zone Content Classification and its Performance Evaluation

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Why Table Ground-Truthing is Hard

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Tabular abstraction, editing, and formatting

Tabular abstraction, editing, and formatting
Ontology Generation from Tables

WISE '03 Proceedings of the Fourth International Conference on Web Information Systems Engineering
Towards the self-annotating web

Proceedings of the 13th international conference on World Wide Web
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Web Document Analysis: Challenges and Opportunities

Web Document Analysis: Challenges and Opportunities
A survey of table recognition: Models, observations, transformations, and inferences

International Journal on Document Analysis and Recognition
Learning to recognize tables in free text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Semantic Wikipedia

Proceedings of the 15th international conference on World Wide Web
Learning table extraction from examples

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Wrapper maintenance: a machine learning approach

Journal of Artificial Intelligence Research

Enabling experts to build knowledge bases from science textbooks

Proceedings of the 4th international conference on Knowledge capture
Harvesting Relational and Structured Knowledge for Ontology Building in the WPro Architecture

AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Semantically Conceptualizing and Annotating Tables

ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
Information Extraction

Foundations and Trends in Databases
Improving the performance of focused web crawlers

Data & Knowledge Engineering
A methodology to learn ontological attributes from the Web

Data & Knowledge Engineering
Analysis and taxonomy of column header categories for web tables

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
KBB: a knowledge-bundle builder for research studies

ER'10 Proceedings of the 2010 international conference on Advances in conceptual modeling: applications and challenges
Factoring web tables

IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
An unsupervised approach for acquiring ontologies and RDF data from online life science databases

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Theoretical foundations for enabling a web of knowledge

FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
The HiLeX system for semantic information extraction

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Extraction and integration of web data by end-users

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
User-driven semantic mapping of tabular data

Proceedings of the 9th International Conference on Semantic Systems
Using linked data to mine RDF from wikipedia's tables

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The tremendous success of the World Wide Web is countervailed by efforts needed to search and find relevant information. For tabular structures embedded in HTML documents, typical keyword or link-analysis based search fails. The Semantic Web relies on annotating resources such as documents by means of ontologies and aims to overcome the bottleneck of finding relevant information. Turning the current Web into a Semantic Web requires automatic approaches for annotation since manual approaches will not scale in general. Most efforts have been devoted to automatic generation of ontologies from text, but with quite limited success. However, tabular structures require additional efforts, mainly because understanding of table contents requires the comprehension of the logical structure of the table on the one hand, as well as its semantic interpretation on the other. The focus of this paper is on the automatic transformation and generation of semantic (F-Logic) frames from table-like structures. The presented work consists of a methodology, an accompanying implementation (called TARTAR) and a thorough evaluation. It is based on a grounded cognitive table model which is stepwise instantiated by the methodology. A typical application scenario is the automatic population of ontologies to enable query answering over arbitrary tables (e.g. HTML tables).