Research problems in data warehousing
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
ICDT '97 Proceedings of the 6th International Conference on Database Theory
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Declarative Language for Querying and Restructuring the Web
RIDE '96 Proceedings of the 6th International Workshop on Research Issues in Data Engineering (RIDE '96) Interoperability of Nontraditional Database Systems
An automated approach for retrieving hierarchical data from HTML tables
Proceedings of the eighth international conference on Information and knowledge management
Learning to extract hierarchical information from semi-structured documents
Proceedings of the ninth international conference on Information and knowledge management
A Heuristic Approach for Converting HTML Documents to XML Documents
CL '00 Proceedings of the First International Conference on Computational Logic
Hi-index | 0.00 |
HTML [Rag96,Sei96] is a well-accepted and widely used language for creating platform-independent documents to be posted on the Web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called WebView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and its (in)directly linked documents. On top of the SDG, WebView provides query processing capability for evaluating SQL-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from WebView.