Intelligent web navigation

Authors:
Inma Hernández
Affiliations:
Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Sevilla Spain
Venue:
FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Year:
2009

Citing 25
Cited 0

Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
A layered architecture for querying dynamic Web content

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Automating Web navigation with the WebVCR

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
On the design of a learning crawler for topical resource discovery

ACM Transactions on Information Systems (TOIS)
Template detection via data mining and its applications

Proceedings of the 11th international conference on World Wide Web
In Search of the Lost Schema

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Crawling the Hidden Web

Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Proceedings of the 27th International Conference on Very Large Data Bases
Semi-Automatic Wrapper Generation for Commercial Web Sources

Proceedings of the IFIP TC8 / WG8.1 Working Conference on Engineering Information Systems in the Internet Context
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Web page feature selection and classification using neural networks

Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
Automatic generation of agents for collecting hidden web pages for data extraction

Data & Knowledge Engineering - Special issue: WIDM 2002
Discovering and Analyzing World Wide Web Collections

Knowledge and Information Systems
QA-Pagelet: Data Preparation Techniques for Large-Scale Data Analysis of the Deep Web

IEEE Transactions on Knowledge and Data Engineering
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
DeepWeb Navigation in Web Data Extraction

CIMCA '05 Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce Vol-2 (CIMCA-IAWTIC'06) - Volume 02
A fast and robust method for web page template detection and removal

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
The hybrid representation model for web document classification

International Journal of Intelligent Systems
Reinforcement Learning with Classifier Selection for Focused Crawling

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Improving the performance of focused web crawlers

Data & Knowledge Engineering
Exploiting genre in focused crawling

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Virtual integration systems retrieve information according to the user's interest. This information is retrieved from several web applications, but it is presented to the user uniformly, in an online process. Therefore, response time is a significant factor. An essential part of any information retrieval system is navigation through pages. Usually web pages contain a high number of links, some of them leading to interesting information, but most of them having other purposes, like advertising or internal site navigation. Traditional crawlers follow every link in each page, in order to analyze the target page, and classify it as interesting or irrelevant. This means having to retrieve, analyze and classify thousands of pages for every single site, which is a costly task. This problem can be solved with the combination of a web page classifier, to distinguish between interesting and irrelevant pages, and a link classifier, which automatically identifies links leading to interesting pages. This kind of navigation is more efficient and has a lower cost than traditional crawlers. Moreover, navigation model is automatically extracted from the site, instead of being handcrafted, reducing the supervision from the user.