Discovering and Analyzing World Wide Web Collections

Authors:
Sougata Mukherjea
Affiliations:
India Research Lab, IBM, India
Venue:
Knowledge and Information Systems
Year:
2004

Citing 0
Cited 5

SVM based adaptive learning method for text classification from positive and unlabeled documents

Knowledge and Information Systems
A conceptual framework for efficient web crawling in virtual integration contexts

WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
A tool for link-based web page classification

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Intelligent web navigation

FDIA'09 Proceedings of the Third BCS-IRSG conference on Future Directions in Information Access
Slash-based relevance propagation model for topic distillation

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the explosive growth of the World Wide Web, it is becoming increasingly difficult for users to discover Web pages that are relevant to a topic. To address this problem we are developing a system that allows the collection and analysis of Web pages related to a particular topic. In this paper we present the system’s overall architecture and introduce the focused crawler used by the system. We also discuss the various techniques we use to allow the user to analyze and gain useful insights about a collection. Finally, we present some statistics on the collections.