Structural analysis of hypertexts: identifying hierarchies and useful metrics
ACM Transactions on Information Systems (TOIS)
HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering
Proceedings of the the seventh ACM conference on Hypertext
Silk from a sow's ear: extracting usable structures from the Web
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Life, death, and lawfulness on the electronic frontier
Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
Enhanced hypertext categorization using hyperlinks
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space---structure in hypermedia systems: links, objects, time and space---structure in hypermedia systems
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Finding related pages in the World Wide Web
WWW '99 Proceedings of the eighth international conference on World Wide Web
Trawling the Web for emerging cyber-communities
WWW '99 Proceedings of the eighth international conference on World Wide Web
Authoritative sources in a hyperlinked environment
Journal of the ACM (JACM)
Clustering hypertext with applications to web searching
HYPERTEXT '00 Proceedings of the eleventh ACM on Hypertext and hypermedia
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Efficient identification of Web communities
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Rank aggregation methods for the Web
Proceedings of the 10th international conference on World Wide Web
Communications of the ACM
The structure of broad topics on the web
Proceedings of the 11th international conference on World Wide Web
Using web structure for classifying and describing web pages
Proceedings of the 11th international conference on World Wide Web
Template detection via data mining and its applications
Proceedings of the 11th international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
Using PageRank to Characterize Web Structure
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Who Links to Whom: Mining Linkage between Web Sites
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
WWW '03 Proceedings of the 12th international conference on World Wide Web
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
ACM SIGIR Forum
Challenges in web search engines
ACM SIGIR Forum
Stochastic models for the Web graph
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Towards a Unified Catalog of Hypermedia Design Patterns
HICSS '00 Proceedings of the 33rd Hawaii International Conference on System Sciences-Volume 6 - Volume 6
The web as a graph: measurements, models, and methods
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Discovery of ads web hosts through traffic data analysis
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Automatic categorization of web sites based on source types
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
The site browser: catalyzing improvements in hypertext organization
Proceedings of the fifteenth ACM conference on Hypertext and hypermedia
Spam, damn spam, and statistics: using statistical analysis to locate spam web pages
Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Identifying link farm spam pages
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Journal of the American Society for Information Science and Technology - Special issue: Webometrics
Detecting phrase-level duplication on the world wide web
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Higher-order rank analysis for web structure
Proceedings of the sixteenth ACM conference on Hypertext and hypermedia
Efficient PageRank approximation via graph aggregation
Information Retrieval
Site level noise removal for search engines
Proceedings of the 15th international conference on World Wide Web
Detecting spam web pages through content analysis
Proceedings of the 15th international conference on World Wide Web
Detecting semantic cloaking on the web
Proceedings of the 15th international conference on World Wide Web
Undue influence: eliminating the impact of link plagiarism on web search rankings
Proceedings of the 2006 ACM symposium on Applied computing
Coarse-grained classification of web sites by their structural properties
WIDM '06 Proceedings of the 8th annual ACM international workshop on Web information and data management
Proceedings of the 16th international conference on World Wide Web
Improving web spam classification using rank-time features
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Improving web spam classifiers using link structure
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
Extracting link spam using biased random walks from spam seed sets
AIRWeb '07 Proceedings of the 3rd international workshop on Adversarial information retrieval on the web
SADIe: Structural semantics for accessibility and device independence
ACM Transactions on Computer-Human Interaction (TOCHI)
Evaluating DANTE: Semantic transcoding for visually disabled users
ACM Transactions on Computer-Human Interaction (TOCHI)
Pattern detection from web using AFA set theory
Proceedings of the 9th annual ACM international workshop on Web information and data management
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Investigating sighted users' browsing behaviour to assist web accessibility
Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility
Identifying web spam with user behavior analysis
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Cleaning search results using term distance features
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Latent dirichlet allocation in web spam filtering
AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
Visual complexity and aesthetic perception of web pages
Proceedings of the 26th annual ACM international conference on Design of communication
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Toward a definition of visual complexity as an implicit measure of cognitive load
ACM Transactions on Applied Perception (TAP)
Extraction and classification of dense implicit communities in the Web graph
ACM Transactions on the Web (TWEB)
Web site topic-hierarchy generation based on link structure
Journal of the American Society for Information Science and Technology
Web corpus mining by instance of Wikipedia
WAC '06 Proceedings of the 2nd International Workshop on Web as Corpus
Tackling content spamming with a term weighting scheme
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Identifying Web Spam with the Wisdom of the Crowds
ACM Transactions on the Web (TWEB)
Towards logical hypertext structure
IICS'04 Proceedings of the 4th international conference on Innovative Internet Community Systems
Intelligent crawling of web applications for web archiving
Proceedings of the 21st international conference companion on World Wide Web
Survey on web spam detection: principles and algorithms
ACM SIGKDD Explorations Newsletter
Automatic genre identification: towards a flexible classification scheme
FDIA'07 Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access
Analysis and detection of web spam by means of web content
IRFC'12 Proceedings of the 5th conference on Multidisciplinary Information Retrieval
Classifying websites into non-topical categories
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Mining taxonomies from web menus: rule-based concepts and algorithms
ICWE'13 Proceedings of the 13th international conference on Web Engineering
SAAD, a content based Web Spam Analyzer and Detector
Journal of Systems and Software
Hi-index | 0.00 |
Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural patterns, as the functionality of a site naturally induces a typical hyperlinked structure and typical connectivity patterns to and from the rest of the Web. Thus, the functionality of Web sites is reflected in a set of structural and connectivity-based features that form a typical signature. In this paper, we automatically categorize sites into eight distinct functional classes, and highlight several search-engine related applications that could make immediate use of such technology. We purposely limit our categorization algorithms by tapping connectivity and structural data alone, making no use of any content analysis whatsoever. When applying two classification algorithms to a set of 202 sites of the eight defined functional categories, the algorithms correctly classified between 54.5% and 59% of the sites. On some categories, the precision of the classification exceeded 85%. An additional result of this work indicates that the structural signature can be used to detect spam rings and mirror sites, by clustering sites with almost identical signatures.