Learning to Understand Information on the Internet: AnExample-Based Approach

Authors:
Mike Perkowitz;Robert B. Doorenbos;Oren Etzioni;Daniel S. Weld
Affiliations:
Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350. E-mail: map@cs.washington.edu, bobd@cs.washington.edu, etzioni@cs.washington.edu, we ...;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350. E-mail: map@cs.washington.edu, bobd@cs.washington.edu, etzioni@cs.washington.edu, we ...;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350. E-mail: map@cs.washington.edu, bobd@cs.washington.edu, etzioni@cs.washington.edu, we ...;Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350. E-mail: map@cs.washington.edu, bobd@cs.washington.edu, etzioni@cs.washington.edu, we ...
Venue:
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Year:
1997

Citing 11
Cited 18

Mediators in the Architecture of Future Information Systems

Computer
Agents that reduce work and information overload

Communications of the ACM
A softbot-based interface to the Internet

Communications of the ACM
Analysis of adaptation and environment

Artificial Intelligence - Special volume on computational research on interaction and agency, part 2
Data model and query evaluation in global information systems

Journal of Intelligent Information Systems - Special issue: networked information discovery and retrieval
Learning Syntax by Automata Induction

Machine Learning
The Design of Discrimination Experiments

Machine Learning
Estimating the Quality of Databases

FQAS '98 Proceedings of the Third International Conference on Flexible Query Answering Systems
FAQ finder: a case-based approach to knowledge navigation

CAIA '95 Proceedings of the 11th Conference on Artificial Intelligence for Applications
Letizia: an agent that assists web browsing

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Planning to gather inforrnation

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1

A scalable comparison-shopping agent for the World-Wide Web

AGENTS '97 Proceedings of the first international conference on Autonomous agents
Computational aspects of resilient data extraction from semistructured sources (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probe, count, and classify: categorizing hidden web databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
QProber: A system for automatic classification of hidden-Web databases

ACM Transactions on Information Systems (TOIS)
Design and Implementation of the Physical Layer in WebBases: The XRover Experience

CL '00 Proceedings of the First International Conference on Computational Logic
On Precision and Recall of Multi-Attribute Data Extraction from Semistructured Sources

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
On the complexity of schema inference from web pages in the presence of nullable data attributes

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
WinAgent: a system for creating and executing personal information assistants using a web browser

Proceedings of the 9th international conference on Intelligent user interfaces
Learning query languages of Web interfaces

Proceedings of the 2004 ACM symposium on Applied computing
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Clustering e-commerce search engines based on their search interface pages using WISE-cluster

Data & Knowledge Engineering - Special issue: WIDM 2004
Classification-aware hidden-web text database selection

ACM Transactions on Information Systems (TOIS)
Computational tools for fluid power system design: towards distributed AI and virtual reality

International Journal of Computer Applications in Technology
Learning semantic definitions of online information sources

Journal of Artificial Intelligence Research
Deploying information agents on the web

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using some web content mining techniques for Arabic text classification

DNCOCO'09 Proceedings of the 8th WSEAS international conference on Data networks, communications, computers
Automatically Constructing Semantic Web Services from Online Sources

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Identity trail: covert surveillance using DNS

PET'07 Proceedings of the 7th international conference on Privacy enhancing technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The explosive growth of the Web has made intelligent softwareassistants increasingly necessary for ordinary computer users. Bothtraditional approaches—search engines, hierarchical indices—andintelligent software agents require significant amounts of humaneffort to keep up with the Web. As an alternative, we investigate theproblem of automatically learning to interact with informationsources on the Internet. We report on ShopBotand ILA , two implemented agents that learn touse such resources. ShopBot learns how to extract information from onlinevendors using only minimal knowledge about product domains. Giventhe home pages of several online stores, ShopBotautonomously learns how to shop at those vendors. After its learningis complete, ShopBot is able to speedily visitover a dozen software stores and CD vendors, extract productinformation, and summarize the results for the user. ILAlearns to translate information from Internetsources into its own internal concepts. ILAbuilds a model of an information source that specifies the translation between the source‘s output and ILA ‘s model of the world. ILA iscapable of leveraging a small amount of knowledge about a domain tolearn models of many information sources. We show that ILA ‘s learning is fast and accurate, requiring only a smallnumber of queries per information source.