User-oriented smart-cache for the Web: what you seek is what you get!

Authors:
Zoé Lacroix;Arnaud Sahuguet;Raman Chandrasekar
Affiliations:
Institute for Research in Cognitive Science, University of Pennsylvania;Computer and Information Science, University of Pennsylvania;Institute for Research in Cognitive Science & Center for the Advanced Study of India, University of Pennsylvania
Venue:
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Year:
1998

Citing 5
Cited 4

STRUDEL: a Web site management system

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
WebOQL: Restructuring Documents, Databases, and Webs

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Information Extraction and Database Techniques: A User-Oriented Approach to Querying the Web

CAiSE '98 Proceedings of the 10th International Conference on Advanced Information Systems Engineering
Disambiguation of super parts of speech (or supertags): almost parsing

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1

Recognizing structure in Web pages using similarity queries

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Reasoning about Textual Similarity in a Web-Based Information Access System

Autonomous Agents and Multi-Agent Systems
Query Reification Based Approach for Object-Oriented Query Formulation Aid

ICTAI '99 Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence
A semantic map of RSS feeds to support discovery

RED'10 Proceedings of the Third international conference on Resource Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Standard database approaches to querying information on the Web focus on the source(s) and provide a query language based on a given predefined organization (schema) of the data: this is the source-driven approach. However, can the Web be seen as a standard database? There is no super-user in charge of monitoring the source(s) (the data is constantly updated), there is no homogeneous structure (no common explicit structure thus), the Web itself never stops growing, etc. For these reasons, we believe that the source-driven standard approach is not suitable to the Web.As an alternative, we propose a user-oriented approach based on the idea that the schema is a posteriori expressed by the user's needs when asking a query. Given a user query, AKIRA (Agentive Knowledge-based Information Retrieval Architecture) [6] extracts a target structure (structure expressed in the query) and uses standard information retrieval and filtering techniques to access potentially relevant documents.The user-oriented paradigm means that the structure through which the data is viewed does not come from the source but is extracted from the user query. When a user asks a query, the relevant information is retrieved from the Web and stored as is in a cache. Then the information is extracted from the raw data using computational linguistic techniques. The AKIRA cache (smart-cache) represents these extracted layers of meta-information on top of the raw data. The smart-cache is an object-oriented database whose schema is inferred from the user's target structure. It is designed on demand through a library of concepts that can be assembled together to match concepts and meta-concepts required in the user's query. The smart cache can be seen as a view of the Web.To the best of our knowledge, AKIRA is the only system that uses information retrieval and extraction integrated with database techniques to provide maximum flexibility to the user and offer transparent access to the content of Web documents.