Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
ACM Transactions on Internet Technology (TOIT)
The Evolution of the Web and Implications for an Incremental Crawler
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Web site mining: a new way to spot competitors, customers and suppliers in the world wide web
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A large-scale study of the evolution of web pages
WWW '03 Proceedings of the 12th international conference on World Wide Web
The connectivity sonar: detecting site functionality by structural patterns
Proceedings of the fourteenth ACM conference on Hypertext and hypermedia
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
The volume and evolution of web page templates
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Two-phase Web site classification based on Hidden Markov Tree models
Web Intelligence and Agent Systems
Weighted proportional k-interval discretization for naive-Bayes classifiers
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Proceedings of the 16th international conference on World Wide Web
Web page classification: Features and algorithms
ACM Computing Surveys (CSUR)
Web site topic-hierarchy generation based on link structure
Journal of the American Society for Information Science and Technology
Query-Sets++: a scalable approach for modeling web sites
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Intelligent crawling of web applications for web archiving
Proceedings of the 21st international conference companion on World Wide Web
Classifying websites into non-topical categories
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Hi-index | 0.00 |
In this paper, we identify and analyze structural properties which reflect the functionality of a Web site. These structural properties consider the size, the organization, the composition of URLs, and the link structure of Web sites. Opposed to previous work, we perform a comprehensive measurement study to delve into the relation between the structure and the functionality of Web sites. Our study focuses on five of the most relevant functional classes, namely Academic, Blog, Corporate, Personal, and Shop. It is based upon more than 1,400 Web sites composed of 7 million crawled and 47 million known Web pages. We present a detailed statistical analysis which provides insight into how structural properties can be used to distinguish between Web sites from different functional classes. Building on these results, we introduce a content-independent approach for the automated coarse-grained classification of Web sites. A naïve Bayesian classifier with advanced density estimation yields a precision of 82% and recall of 80% for the classification of Web sites into the considered classes.