Elements of information theory
Elements of information theory
Randomized algorithms
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
On power-law relationships of the Internet topology
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages
Proceedings of the 10th international conference on World Wide Web
Probabilistic combination of content and links
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerated focused crawling through online relevance feedback
Proceedings of the 11th international conference on World Wide Web
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Focused Crawling Using Context Graphs
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximating Aggregate Queries about Web Pages via Random Walks
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Extrapolation methods for accelerating PageRank computations
WWW '03 Proceedings of the 12th international conference on World Wide Web
Building a web thesaurus from web link structure
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Finding similar academic web sites with links, bibliometric couplings and colinks
Information Processing and Management: an International Journal
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Report of Activities at the WIC-India Research Center
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Exploiting Interclass Rules for Focused Crawling
IEEE Intelligent Systems
A Web Surfer Model Incorporating Topic Continuity
IEEE Transactions on Knowledge and Data Engineering
A modeling approach to uncover hyperlink patterns: the case of Canadian universities
Information Processing and Management: an International Journal
A General Evaluation Framework for Topical Crawlers
Information Retrieval
Toward a basic framework for webometrics
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Lexical and semantic clustering by web links
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Mapping the Semantics of Web Text and Links
IEEE Internet Computing
Topical TrustRank: using topicality to combat web spam
Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Implementation and evaluation of a quality-based search engine
Proceedings of the seventeenth conference on Hypertext and hypermedia
Knowing a web page by the company it keeps
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Using similarity links as shortcuts to relevant web pages
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Enhancing digital libraries using missing content analysis
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Classifiers without borders: incorporating fielded text from neighboring web pages
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
A cross-language focused crawling algorithm based on multiple relevance prediction strategies
Computers & Mathematics with Applications
Multimedia data mining and searching through dynamic index evolution
VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
Detection of web communities from community cores
WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
Effective filtering for collaborative publishing
WINE'05 Proceedings of the First international conference on Internet and Network Economics
CoLIS'05 Proceedings of the 5th international conference on Context: conceptions of Library and Information Sciences
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
Net Increase? Cross-Lingual Linking in the Blogosphere
Journal of Computer-Mediated Communication
LBSNRank: personalized pagerank on location-based social networks
Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Automatic seed set expansion for trust propagation based anti-spam algorithms
Information Sciences: an International Journal
Dynamic FOAF management method for social networks in the social web environment
The Journal of Supercomputing
Hi-index | 0.00 |
The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the Open Directory provides a useful framework for understanding the structure of content-based clusters and communities. In particular, using a topic taxonomy and an automatic classifier, we can measure the background distribution of broad topics on the Web, and analyze the capability of recent random walk algorithms to draw samples which follow such distributions. In addition, we can measure the probability that a page about one broad topic will link to another broad topic. Extending this experiment, we can measure how quickly topic context is lost while walking randomly on the Web graph. Estimates of this topic mixing distance may explain why a global PageRank is still meaningful in the context of broad queries. In general, our measurements may prove valuable in the design of community-specific crawlers and link-based ranking systems.