Principles of database and knowledge-base systems, Vol. I
Principles of database and knowledge-base systems, Vol. I
Set constructors in a logic database language
Journal of Logic Programming
A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
ROL: a deductive object base language
Information Systems
A query language for a Web-site management system
ACM SIGMOD Record
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
World Wide Web Journal - Special issue on XML: principles, tools, and techniques
Database techniques for the World-Wide Web: a survey
ACM SIGMOD Record
Modeling Web sources for information integration
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Effective Web data extraction with standard XML technologies
Proceedings of the 10th international conference on World Wide Web
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
On a Declarative Semantics for Web Queries
DOOD '97 Proceedings of the 5th International Conference on Deductive and Object-Oriented Databases
Object Exchange Across Heterogeneous Information Sources
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
ICDT '97 Proceedings of the 6th International Conference on Database Theory
WebOQL: Restructuring Documents, Databases, and Webs
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
W3QS: A Query System for the World-Wide Web
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Rule-Based Query Language for HTML
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
A Declarative Language for Querying and Restructuring the Web
RIDE '96 Proceedings of the 6th International Workshop on Research Issues in Data Engineering (RIDE '96) Interoperability of Nontraditional Database Systems
Integration of Semistructured Data with Partial and Inconsistent Information
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
A conceptual model for the web
ER'00 Proceedings of the 19th international conference on Conceptual modeling
Capturing Semantics in HTML Documents
DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Towards semistructured data integration
Web-enabled systems integration
Feed Querying as a Proxy for Querying the Web
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
An algorithm of online goods information extraction with two-stage working pattern
FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
WebSelF: a web scraping framework
ICWE'12 Proceedings of the 12th international conference on Web Engineering
Hi-index | 0.01 |
Most documents available over the Web conform to the HTML specification. Such documents are hierarchically structured in nature. The existing data models for the Web either fail to capture the hierarchical structure within the documents or can only provide a very low level representation of such hierarchical structure. How to represent and query HTML documents at a higher level is an important issue. In this paper, we first propose a novel conceptual model for HTML. This conceptual model has only a few simple constructs but is able to represent the complex hierarchical structure within HTML documents at a level that is close to human conceptualization/visualization of the documents. We also describe how to convert HTML documents based on this conceptual model. Using the conceptual model and conversion method, one can capture the essence (i.e., semistructure) of HTML documents in a natural and simple way. Based on this conceptual model, we then present a rule-based language to query HTML documents over the Internet. This language provides a simple but very powerful way to query both intra-document structures and inter-document structures and allows the query results to be restructured. Being rule-based, it naturally supports negation and recursion and therefore is more expressive than SQL-based languages. A logical semantics is also provided.