NFQL: the natural forms query language
ACM Transactions on Database Systems (TODS)
Identifying syntactic differences between two programs
Software—Practice & Experience
An automated approach for retrieving hierarchical data from HTML tables
Proceedings of the eighth international conference on Information and knowledge management
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
Probe, count, and classify: categorizing hidden web databases
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Creating Semantic Web Contents with Protégé-2000
IEEE Intelligent Systems
MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
S-CREAM - Semi-automatic CREAtion of Metadata
EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
OntoWeb - A Semantic Web Community Portal
PAKM '02 Proceedings of the 4th International Conference on Practical Aspects of Knowledge Management
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure
ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Tabular abstraction, editing, and formatting
Tabular abstraction, editing, and formatting
Towards the self-annotating web
Proceedings of the 13th international conference on World Wide Web
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A survey of table recognition: Models, observations, transformations, and inferences
International Journal on Document Analysis and Recognition
KIM – a semantic platform for information extraction and retrieval
Natural Language Engineering
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Automating the extraction of data from HTML tables with unknown structure
Data & Knowledge Engineering - Special issue: ER 2002
Towards Ontology Generation from Tables
World Wide Web
Thesis: automatic ontology generation from web tabular structures
AI Communications
Learning table extraction from examples
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Quantitative and qualitative evaluation of the OntoLearn ontology learning system
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Ontology aware software service agents: meeting ordinary user needs on the semantic web
Ontology aware software service agents: meeting ordinary user needs on the semantic web
Table extraction using spatial reasoning on the CSS2 visual box model
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Seed-based generation of personalized bio-ontologies for information extraction
ER'07 Proceedings of the 2007 conference on Advances in conceptual modeling: foundations and applications
Enriching OWL with instance recognition semantics for automated semantic annotation
ER'07 Proceedings of the 2007 conference on Advances in conceptual modeling: foundations and applications
Using ontologies for extracting product features from web pages
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Text2Onto: a framework for ontology learning and data-driven change discovery
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
ASWC'06 Proceedings of the First Asian conference on The Semantic Web
ODE: Ontology-assisted data extraction
ACM Transactions on Database Systems (TODS)
Information extraction for search engines using fast heuristic techniques
Data & Knowledge Engineering
FOCIH: Form-Based Ontology Creation and Information Harvesting
ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
A methodology to learn ontological attributes from the Web
Data & Knowledge Engineering
Analysis and taxonomy of column header categories for web tables
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Evaluating ontology extraction tools using a comprehensive evaluation framework
Data & Knowledge Engineering
Enabling search for facts and implied facts in historical documents
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Theoretical foundations for enabling a web of knowledge
FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
Financial news semantic search engine
Expert Systems with Applications: An International Journal
TEX: An efficient and effective unsupervised Web information extractor
Knowledge-Based Systems
Hi-index | 0.00 |
The longstanding problem of automatic table interpretation still eludes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction, semantic annotation, and semi-structured data management. In this paper, we offer a solution for the common special case in which so-called sibling pages are available. The sibling pages we consider are pages on the hidden web, commonly generated from underlying databases. Our system compares them to identify and connect nonvarying components (category labels) and varying components (data values). We tested our solution using more than 2000 tables in source pages from three different domains-car advertisements, molecular biology, and geopolitical information. Experimental results show that the system can successfully identify sibling tables, generate structure patterns, interpret tables using the generated patterns, and automatically adjust the structure patterns as it processes a sequence of hidden-web pages. For these activities, the system was able to achieve an overall F-measure of 94.5%. Further, given that we can automatically interpret tables, we next show that this leads immediately to a conceptualization of the data in these interpreted tables and thus also to a way to semantically annotate these interpreted tables with respect to the ontological conceptualization. Labels in nested table structures yield ontological concepts and interrelationships among these concepts, and associated data values become annotated information. We further show that semantically annotated data leads immediately to queriable data. Thus, the entire process, which is fully automatic, transform facts embedded within tables into facts accessible by standard query engines.