Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Making large-scale support vector machine learning practical
Advances in kernel methods
Focused crawling: a new approach to topic-specific Web resource discovery
WWW '99 Proceedings of the eighth international conference on World Wide Web
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Web classification using support vector machine
Proceedings of the 4th international workshop on Web information and data management
A Study of Approaches to Hypertext Categorization
Journal of Intelligent Information Systems
Discovering Test Set Regularities in Relational Domains
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Hypertext Categorization using Hyperlink Patterns and Meta Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Refinement of TF-IDF schemes for web pages using their hyperlinked neighboring pages
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Combining link-based and content-based methods for web document classification
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Simple BM25 extension to multiple weighted fields
Proceedings of the thirteenth ACM international conference on Information and knowledge management
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Finding experts and their eetails in e-mail corpora
Proceedings of the 15th international conference on World Wide Web
Topical link analysis for web search
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Graph-based text classification: learn from your neighbors
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Stanford WebBase components and applications
ACM Transactions on Internet Technology (TOIT)
Knowing a web page by the company it keeps
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A comparative study on classifying the functions of web page blocks
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Hierarchical Language Models for Expert Finding in Enterprise Corpora
ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Know your neighbors: web spam detection using the web topology
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combining content and link for classification using matrix factorization
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Importance of HTML structural elements and metadata in automated subject classification
ECDL'05 Proceedings of the 9th European conference on Research and Advanced Technology for Digital Libraries
Link-Local features for hypertext classification
EWMF'05/KDO'05 Proceedings of the 2005 joint international conference on Semantics, Web and Mining
Novel web page classification techniques in contextual advertising
Proceedings of the eleventh international workshop on Web information and data management
The anatomy of an ad: structured indexing and retrieval for sponsored search
Proceedings of the 19th international conference on World wide web
Enhancing search applications by utilizing mind maps
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Webpage relationships for information retrieval within a structured domain
Proceedings of the 21st ACM conference on Hypertext and hypermedia
Scholarly paper recommendation via user's recent research interests
Proceedings of the 10th annual joint conference on Digital libraries
A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification
ACM Transactions on the Web (TWEB)
Topic classification in social media using metadata from hyperlinked objects
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Improving categorisation in social media using hyperlinks to structured data sources
ESWC'11 Proceedings of the 8th extended semantic web conference on The semanic web: research and applications - Volume Part II
Hi-index | 0.00 |
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby pages to improve performance. In contrast, in this work we utilize a weighted combination of the contents of neighbors to generate a better virtual document for classification. In addition, we break pages into fields, finding that a weighted combination of text from the target and fields of neighboring pages is able to reduce classification error by more than a third. We demonstrate performance on a large dataset of pages from the Open Directory Project and validate the approach using pages from a crawl from the Stanford WebBase. Interestingly, we find no value in anchor text and unexpected value in page titles (and especially titles of parent pages) in the virtual document.