Mining the Web's Link Structure

Authors:
Soumen Chakrabarti;Byron E. Dom;S. Ravi Kumar;Prabhakar Raghavan;Sridhar Rajagopalan;Andrew Tomkins;David Gibson;Jon Kleinberg
Affiliations:
-;-;-;-;-;-;-;-
Venue:
Computer
Year:
1999

Citing 9
Cited 127

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Database techniques for the World-Wide Web: a survey

ACM SIGMOD Record
Improved algorithms for topic distillation in a hyperlinked environment

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery

WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval

A practical hypertext catergorization method using links and incrementally available class information

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Recent results in automatic Web resource discovery

ACM Computing Surveys (CSUR)
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
Intelligent crawling on the World Wide Web with arbitrary predicates

Proceedings of the 10th international conference on World Wide Web
Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction

Proceedings of the 10th international conference on World Wide Web
Cover story: structural Web search using a graph-based discovery system

intelligence
SALSA: the stochastic approach for link-structure analysis

ACM Transactions on Information Systems (TOIS)
Enhanced topic distillation using text, markup tags, and hyperlinks

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
On the design of a learning crawler for topical resource discovery

ACM Transactions on Information Systems (TOIS)
PageRate: counting Web users' votes

Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Small-world linkage and co-linkage

Proceedings of the 12th ACM conference on Hypertext and Hypermedia
Building efficient and effective metasearch engines

ACM Computing Surveys (CSUR)
TalkMine: a soft computing approach to adaptive knowledge recommendation

Soft computing agents
I/O-efficient techniques for computing pagerank

Proceedings of the eleventh international conference on Information and knowledge management
Entropy-based link analysis for mining web informative structures

Proceedings of the eleventh international conference on Information and knowledge management
Studying Recommendation Algorithms by Graph Analysis

Journal of Intelligent Information Systems
PIPE: Web Personalization by Partial Evaluation

IEEE Internet Computing
Streaming-Media Knowledge Discovery

Computer
Data Mining for Web Intelligence

Computer
Characterizing the Citation Graph as a Self-Organizing Networked Information Space

IICS '02 Proceedings of the Second International Workshop on Innovative Internet Computing Systems
Computing Geographical Scopes of Web Resources

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Visual Ranking of Link Structures

WADS '01 Proceedings of the 7th International Workshop on Algorithms and Data Structures
Building an Information and Knowledge Fusion System

Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
A Method for Discovering Purified Web Communities

DS '01 Proceedings of the 4th International Conference on Discovery Science
Data Mining of User Navigation Patterns

WEBKDD '99 Revised Papers from the International Workshop on Web Usage Analysis and User Profiling
Comparison of Three Vertical Search Spiders

Computer
An approach to confidence based page ranking for user oriented Web search

ACM SIGMOD Record
Re-ranking search results using network analysis a case study with google: a case study with Google

CASCON '02 Proceedings of the 2002 conference of the Centre for Advanced Studies on Collaborative research
An analysis of Internet content delivery systems

ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Data mining for hypertext: a tutorial survey

ACM SIGKDD Explorations Newsletter
Implicit link analysis for small web search

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Topic distillation using hierarchy concept tree

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Compressing the Graph Structure of the Web

DCC '01 Proceedings of the Data Compression Conference
WebReader: a Mechanism for Automating the Search and Collecting Information from the World Wide Web

WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 2 - Volume 2
Searching the hypermedia web: improved topic distillation through network analytic relevance ranking

The New Review of Hypermedia and Multimedia - Hypermedia and the world wide web
Critical and future trends in data mining: a review of key data mining technologies/applications

Data mining
Graph-based hierarchical conceptual clustering

The Journal of Machine Learning Research
THESUS: Organizing Web document collections based on link semantics

The VLDB Journal — The International Journal on Very Large Data Bases
Mining Web Informative Structures and Contents Based on Entropy Analysis

IEEE Transactions on Knowledge and Data Engineering
SEWeP: using site semantics and a taxonomy to enhance the Web personalization process

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Simulation in Web data management

Applied system simulation
Characterizing Web Usage Regularities with Information Foraging Agents

IEEE Transactions on Knowledge and Data Engineering
What's new on the web?: the evolution of the web from a search engine perspective

Proceedings of the 13th international conference on World Wide Web
Link fusion: a unified link analysis framework for multi-type interrelated data objects

Proceedings of the 13th international conference on World Wide Web
LinkSelector: A Web mining approach to hyperlink selection for Web portals

ACM Transactions on Internet Technology (TOIT)
Managing distributed collections: evaluating web page changes, movement, and replacement

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
On Leveraging User Access Patterns for Topic Specific Crawling

Data Mining and Knowledge Discovery
Web Mining: Research and Practice

Computing in Science and Engineering
Similarity spreading: a unified framework for similarity calculation of interrelated objects

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
Knowledge portals and the emerging digital knowledge workplace

IBM Systems Journal
Recommender Systems Research: A Connection-Centric Survey

Journal of Intelligent Information Systems
TSSP: A Reinforcement Algorithm to Find Related Papers

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Local methods for estimating pagerank values

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Node ranking in labeled directed graphs

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Effect of different network analysis strategies on search engine re-ranking

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Modeling of growing networks with directional attachment and communities

Neural Networks
Rank-Stability and Rank-Similarity of Link-Based Web Ranking Algorithms in Authority-Connected Graphs

Information Retrieval
An analysis of internet content delivery systems

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Toward a basic framework for webometrics

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Trend detection through temporal link analysis

Journal of the American Society for Information Science and Technology - Special issue: Webometrics
User feedback based enhancement in web search quality

Information Sciences—Informatics and Computer Science: An International Journal
Language identification in web pages

Proceedings of the 2005 ACM symposium on Applied computing
Algorithmic foundations of the internet

ACM SIGACT News
SimFusion: measuring similarity using unified relationship matrix

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Clustering web pages based on their structure

Data & Knowledge Engineering - Special issue: WIDM 2003
Higher-Order Web Link Analysis Using Multilinear Algebra

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
An ontology-based information retrieval system

IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence
A data mining course for computer science: primary sources and implementations

Proceedings of the 37th SIGCSE technical symposium on Computer science education
The web structure of e-government - developing a methodology for quantitative evaluation

Proceedings of the 15th international conference on World Wide Web
Topical link analysis for web search

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The design and evaluation of accessibility on web navigation

Decision Support Systems
Web search engine based on DNS

Journal of Network and Computer Applications
Web outlier mining: Discovering outliers from web datasets

Intelligent Data Analysis
Extraction and classification of dense communities in the web

Proceedings of the 16th international conference on World Wide Web
Different Aspects of Social Network Analysis

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Collaborative classifier agents: studying the impact of learning in distributed document classification

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Semantic Web approach to smart link generation for Web navigations

Software—Practice & Experience
A machine learning approach to web page filtering using content and structure analysis

Decision Support Systems
Data mining from 1994 to 2004: an application-orientated review

International Journal of Business Intelligence and Data Mining
A distributed, fault-tolerant multi-agent web mining system for scalable web search

AIC'05 Proceedings of the 5th WSEAS International Conference on Applied Informatics and Communications
Extracting and ranking viral communities using seeds and content similarity

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
Separate and inequal: preserving heterogeneity in topical authority flows

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Weighted graphs and disconnected components: patterns and a generator

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
S-SimRank: Combining Content and Link Information to Cluster Papers Effectively and Efficiently

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Mining Context-Specific Web Knowledge: An Experimental Dictionary-Based Approach

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Collection-level analysis tools for books online

Proceedings of the 2008 ACM workshop on Research advances in large digital book repositories
Using Web Clustering for Web Communities Mining and Analysis

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Extraction and classification of dense implicit communities in the Web graph

ACM Transactions on the Web (TWEB)
Adaptive Web SitesA Knowledge Extraction from Web Data Approach

Proceedings of the 2008 conference on Adaptive Web Sites: A Knowledge Extraction from Web Data Approach
Site-Wide Wrapper Induction for Life Science Deep Web Databases

DILS '09 Proceedings of the 6th International Workshop on Data Integration in the Life Sciences
From whence does your authority come?: utilizing community relevance in ranking

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Web page ranking based on fuzzy and learning automata

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A brief survey of computational approaches in social computing

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Design and evaluation of improvement method on the web information navigation - A stochastic search approach

Decision Support Systems
Relevance feedback using weight propagation compared with information-theoretic query expansion

ECIR'07 Proceedings of the 29th European conference on IR research
Web document modeling

The adaptive web
Application of the pagerank algorithm to alarm graphs

ICICS'07 Proceedings of the 9th international conference on Information and communications security
Design of SMACA: synthesis and its analysis through rule vector graph for web based application

International Journal of Intelligent Information and Database Systems
Mining the web with hierarchical crawlers – a resource sharing based crawling approach

International Journal of Intelligent Information and Database Systems
Structure vs. content in hierarchical corpora

Information Retrieval
Automatic sitemaps generation: Exploring website structures using block extraction and hyperlink analysis

Expert Systems with Applications: An International Journal
Developing web intelligence using data mining

CIMMACS'07 Proceedings of the 6th WSEAS international conference on Computational intelligence, man-machine systems and cybernetics
Architecture for a parallel focused crawler for clickstream analysis

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part I
An architecture for a focused trend parallel Web crawler with the application of clickstream analysis

Information Sciences: an International Journal
HSWS: enhancing efficiency of web search engine via semantic web

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A similarity reinforcement algorithm for heterogeneous web pages

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
A similarity-aware multiagent-based web content management scheme

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Information assistant: an initiative topic search engine

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Hybrid approach to web content outlier mining without query vector

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Ontology facilitated community navigation – who is interesting for what i am interested in?

CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context
User experience evaluation of Google search for obtaining medical knowledge: a case study

International Journal of Data Mining and Bioinformatics
IKUM: an integrated web personalization platform based on content structures and user behavior

ITWP'03 Proceedings of the 2003 international conference on Intelligent Techniques for Web Personalization
Integrating web content clustering into web log association rule mining

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Algorithmic foundations of the internet: foreword

CAAN'04 Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking
Search engines and web information retrieval

CAAN'04 Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking
FlexiRank: an algorithm offering flexibility and accuracy for ranking the web pages

ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Index ordering by query-independent measures

Information Processing and Management: an International Journal
A survey on swarm and evolutionary algorithms for web mining applications

SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part II
Web document clustering using hyperlink structures

Computational Statistics & Data Analysis
Generation of SMACA and its application in web services

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Construction of Domain Ontologies: Sourcing the World Wide Web

International Journal of Intelligent Information Technologies
A Data-Driven Approach to Measure Web Site Navigability

Journal of Management Information Systems
Person attribute extraction from the textual parts of web pages

Acta Cybernetica
Architecture specification of rule-based deep web crawler with indexer

International Journal of Knowledge and Web Intelligence
Web link-based relationships among top European universities

Journal of Information Science
The Erdős webgraph server

Discrete Applied Mathematics
A multi-level network analysis of web-citations among the world's universities

Scientometrics

Quantified Score

Hi-index	4.10

Visualization

Abstract

The Web is a hypertext body of approximately 300 million pages that continues to grow at roughly a million pages per day. Page variation is more prodigious than the data's raw scale: Taken as a whole, the set of Web pages lacks a unifying structure and shows far more authoring style and content variation than that seen in traditional text-document collections. This level of complexity makes an "off-the-shelf" database-management and information-retrieval solution impossible. To date, index-based search engines for the Web have been the primary tool by which users search for information. Such engines can build giant indices that let you quickly retrieve the set of all Web pages containing a given word or string. Experienced users can make effective use of such engines for tasks that can be solved by searching for tightly constrained keywords and phrases. These search engines are, however, unsuited for a wide range of equally important tasks. In particular, a topic of any breadth will typically contain several thousand or million relevant Web pages. How then, from this sea of pages, should a search engine select the correct ones-those of most value to the user?