Page and link classifications: connecting diverse resources
Proceedings of the third ACM conference on Digital libraries
Task-oriented world wide web retrieval by document type classification
Proceedings of the eighth international conference on Information and knowledge management
Genre taxonomy: A knowledge repository of communicative actions
ACM Transactions on Information Systems (TOIS)
Automatic text categorization in terms of genre and author
Computational Linguistics
International Journal of Advanced Intelligence Paradigms
A model for online consumer health information quality
Journal of the American Society for Information Science and Technology
A Genre-Aware Approach to Focused Crawling
World Wide Web
Enhance web pages genre identification using neighboring pages
WISE'11 Proceedings of the 12th international conference on Web information system engineering
Inspeção de usabilidade em aplicações web guiada por funcionalidades
Proceedings of the Companion Proceedings of the 10th Brazilian Symposium on Human Factors in Computing Systems and the 5th Latin American Conference on Human-Computer Interaction
Measuring the Visual Complexities of Web Pages
ACM Transactions on the Web (TWEB)
BidTerm Suggestion for Advertising Webpages
ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Classifying the socio-situational settings of transcripts of spoken discourses
Speech Communication
Hi-index | 0.00 |
In this paper we present an automatic genre-based Web page classification system. Unlike subject or topic based classifications, genre-based classifications focus on functional purposes and classify web pages into categories such as online shopping, technical paper, or discussion forum. Until now, the genre classifications are not well developed due to the subjectivities and difficulties to define the genre, the features, and even the categories. In this paper, we define five top-level genre categories, each of which has several subcategories, and develop new methods to extract 31 features from Web pages to identify the categories. We analyze not only the contents of the Web pages, but also the URLs, HTML tags, Java scripts, and VB scripts. We developed a genre classification system that achieved average accuracy of 93%. In addition, we combined this genre classification with our subject-based classification to produce a comprehensive Web page classification system.