PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Querying Semistructured Heterogeneous Information
DOOD '95 Proceedings of the Fourth International Conference on Deductive and Object-Oriented Databases
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Hi-index | 0.00 |
Most of the existing methods for retrieving sub-page level information in an HTML document either rely on keyword-based searching or assume that the internal structure of the document is known beforehand. These techniques, however, are not suitable for locating hierarchically organized information, especially when the internal structure of the given HTML document D is unknown. We present an approach for inferring information hierarchy at the sub-page level of D. Our approach includes constructing a meta data model of D, called content tree (CT), which captures the hierarchical relationship among the data contents of D. Hierarchical information in D can be retrieved via CT using an existing semistructured query language.