Accelerated focused crawling through online relevance feedback

Authors:
Soumen Chakrabarti;Kunal Punera;Mallela Subramanyam
Affiliations:
IIT Bombay;IIT Bombay;University of Texas, Austin
Venue:
Proceedings of the 11th international conference on World Wide Web
Year:
2002

Citing 24
Cited 65

Automated learning of decision rules for text categorization

ACM Transactions on Information Systems (TOIS)
Information retrieval in the World-Wide Web: making client-based searching feasible

Selected papers of the first conference on World-Wide Web
Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Efficient crawling through URL ordering

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The shark-search algorithm. An application: tailored Web site mapping

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Topic Distillation and Spectral Filtering

Artificial Intelligence Review - Special issue on data mining on the Internet
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
WTMS: a system for collecting for collecting and analyzing topic-specific Web information

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Adaptive Retrieval Agents: Internalizing Local Contextand Scaling up to the Web

Machine Learning - Special issue on information retrieval
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction

Proceedings of the 10th international conference on World Wide Web
Exploring the Web with reconnaissance agents

Communications of the ACM
Evaluating topic-driven web crawlers

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Using Reinforcement Learning to Spider the Web Efficiently

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Regression by Classification

SBIA '96 Proceedings of the 13th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Focused Crawling Using Context Graphs

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
Stochastic models for the Web graph

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Letizia: an agent that assists web browsing

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

The structure of broad topics on the web

Proceedings of the 11th international conference on World Wide Web
Deriving link-context from HTML tag tree

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Using urls and table layout for web classification tasks

Proceedings of the 13th international conference on World Wide Web
Panorama: extending digital libraries with topical crawlers

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Extracting Precise Link Context Using NLP Parsing Technique

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Probabilistic models for focused web crawling

Proceedings of the 6th annual ACM international workshop on Web information and data management
Exploiting Interclass Rules for Focused Crawling

IEEE Intelligent Systems
Learning to extract information from large domain-specific websites using sequential models

ACM SIGKDD Explorations Newsletter
A General Evaluation Framework for Topical Crawlers

Information Retrieval
Lexical and semantic clustering by web links

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Learning to crawl: Comparing classification schemes

ACM Transactions on Information Systems (TOIS)
Link Contexts in Classifier-Guided Topical Crawlers

IEEE Transactions on Knowledge and Data Engineering
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
Geographically focused collaborative crawling

Proceedings of the 15th international conference on World Wide Web
WebKhoj: Indian language IR from multiple character encodings

Proceedings of the 15th international conference on World Wide Web
To search or to crawl?: towards a query optimizer for text-centric tasks

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Structure-driven crawler generation by example

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Focused crawling guided by link context

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Category ranking for personalized search

Data & Knowledge Engineering
Using HMM to learn user browsing patterns for focused web crawling

Data & Knowledge Engineering - Special issue: WIDM 2004
Combining classifiers to identify online databases

Proceedings of the 16th international conference on World Wide Web
An adaptive crawler for locating hidden-Web entry points

Proceedings of the 16th international conference on World Wide Web
First-order focused crawling

Proceedings of the 16th international conference on World Wide Web
K-relevance: a spectrum of relevance for data sources impacting a query

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Focused crawling with scalable ordinal regression solvers

Proceedings of the 24th international conference on Machine learning
Towards a query optimizer for text-centric tasks

ACM Transactions on Database Systems (TODS)
Accurate and efficient crawling for relevant websites

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
RankMass crawler: a crawler with high personalized pagerank coverage guarantee

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Urban web crawling

Proceedings of the first international workshop on Location and the web
Exploiting Multiple Features with MEMMs for Focused Web Crawling

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
An Ontology-Based Focused Crawler

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
A Clustering Framework to Build Focused Web Crawlers for Automatic Extraction of Cultural Information

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Link-Contexts for Ranking

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
CRAWLING THE CONSTRUCTION WEB-A MACHINE-LEARNING APPROACH WITHOUT NEGATIVE EXAMPLES

Applied Artificial Intelligence
Focused Crawling with Heterogeneous Semantic Information

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Querying structured information sources on the web

Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Information Extraction

Foundations and Trends in Databases
A cross-language focused crawling algorithm based on multiple relevance prediction strategies

Computers & Mathematics with Applications
Topical web crawling using weighted anchor text and web page change detection techniques

WSEAS Transactions on Information Science and Applications
A framework to derive web page context from hyperlink structure

International Journal of Information and Communication Technology
Towards a universal marketplace over the web: statistical multi-label classification of service provider forms with simulated annealing

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamic hypertext generation for reusing open corpus content

Proceedings of the 20th ACM conference on Hypertext and hypermedia
Improving the performance of focused web crawlers

Data & Knowledge Engineering
Automated ontology instantiation from tabular web sources-The AllRight system

Web Semantics: Science, Services and Agents on the World Wide Web
Adaptive geospatially focused crawling

Proceedings of the 18th ACM conference on Information and knowledge management
SCTWC: An online semi-supervised clustering approach to topical web crawlers

Applied Soft Computing
FICA: A novel intelligent crawling algorithm based on reinforcement learning

Web Intelligence and Agent Systems
Generation of Specifications Forms through Statistical Learning for a Universal Services Marketplace

WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Adaptive focused crawling

The adaptive web
Record extraction based on user feedback and document selection

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Ontology-based focused crawling of deep web sources

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Mining the web with hierarchical crawlers – a resource sharing based crawling approach

International Journal of Intelligent Information and Database Systems
Querying structured information sources on the Web

International Journal of Metadata, Semantics and Ontologies
Application of structured document parsing to focused web crawling

Computer Standards & Interfaces
Focusing on novelty: a crawling strategy to build diverse language models

Proceedings of the 20th ACM international conference on Information and knowledge management
wHunter: a focused web crawler – a tool for digital library

ICADL'04 Proceedings of the 7th international Conference on Digital Libraries: international collaboration and cross-fertilization
Adaptive topical web crawling for domain-specific resource discovery guided by link-context

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence
Intelligent search on the internet

Reasoning, Action and Interaction in AI Theories and Systems
Querying web images by topic and example specification methods

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Dynamic refinement of search engines results utilizing the user intervention

Journal of Systems and Software
E-FFC: an enhanced form-focused crawler for domain-specific deep web databases

Journal of Intelligent Information Systems
Competitive intelligence for SMEs: a web-based decision support system

International Journal of Business Information Systems
Topical crawling on the web through local site-searches

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded.