PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
WebView: A Tool for Retrieving Internal Structures and Extracting Information from HTML Documents
DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Computational aspects of resilient data extraction from semistructured sources (extended abstract)
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Learning to extract hierarchical information from semi-structured documents
Proceedings of the ninth international conference on Information and knowledge management
Function-based object model towards website adaptation
Proceedings of the 10th international conference on World Wide Web
Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure
ER '02 Proceedings of the 21st International Conference on Conceptual Modeling
An Adaptive Web Content Delivery System
AH '00 Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Visual Based Content Understanding towards Web Adaptation
AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
Automating the extraction of data from HTML tables with unknown structure
Data & Knowledge Engineering - Special issue: ER 2002
Extracting logical structures from HTML tables
Computer Standards & Interfaces
On designing a market monitoring web agent system
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
Enabling Interactive Access to Web Tables
Proceedings of the 13th International Conference on Human-Computer Interaction. Part I: New Trends
Acoustic Rendering of Data Tables Using Earcons and Prosody for Document Accessibility
UAHCI '09 Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction. Part III: Applications and Services
Enhancing browsing experience of table and image elements in web pages
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
OSD-DB: a military logistics mobile database
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Enabling efficient browsing and manipulation of web tables on smartphone
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III
Acoustic modeling of dialogue elements for document accessibility
UAHCI'11 Proceedings of the 6th international conference on Universal access in human-computer interaction: applications and services - Volume Part IV
An XML approach to semantically extract data from HTML tables
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Diction based prosody modeling in table-to-speech synthesis
TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue
Adapting data table to improve web accessibility
Proceedings of the 10th International Cross-Disciplinary Conference on Web Accessibility
Hi-index | 0.00 |
Among the HTML elements, HTML tables [RHJ98] encapsulate hierarchically structured data (hierarchical data in short) in a tabular structure. HTML tables do not come with a rigid schema and almost any forms of two-dimensional tables are acceptable according to the HTML grammar. This relaxation complicates the process of retrieving hierarchical data from HTML tables. In this paper, we propose an automated approach for retrieving hierarchical data from HTML tables. The proposed approach constructs the content tree of an HTML table, which captures the intended hierarchy of the data content of the table, without requiring the internal structure of the table to be known beforehand. Also, the user of the content tree does not deal with HTML tags while retrieving the desired data from the content tree. Our approach can be employed by (i) a query language written for retrieving hierarchically structured data, extracted from either the contents of HTML tables or other sources, (ii) a processor for converting HTML tables to XML documents, and (iii) a data warehousing repository for collecting hierarchical data from HTML tables and storing materialized views of the tables. The time complexity of the proposed retrieval approach is proportional to the number of HTML elements in an HTML table.