Web Communities Defined by Web Page Content
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
DTU: A Decision Tree for Uncertain Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Web Site Description Based on Genres and Web Design Patterns
SOCINFO '09 Proceedings of the 2009 International Workshop on Social Informatics
Classifying Web Pages by Genre: An n-Gram Approach
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
A web personalizing technique using adaptive data structures: The case of bursts in web visits
Journal of Systems and Software
A Comprehensive Study of Features and Algorithms for URL-Based Topic Classification
ACM Transactions on the Web (TWEB)
Cross-lingual genre classification
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Hi-index | 0.00 |
With the increase of the number of web pages, it is very difficult to find wanted information easily and quickly out of thousands of web pages retrieved by a search engine. To solve this problem, many researches propose to classify documents according to their genre, which is another criteria to classify documents different from the topic. Most of these works assign a document to only one genre. In this paper we propose a new flexible approach for document genre categorization. Flexibility means that our approach assigns a document to all predefined genres with different weights. The proposed approach is based on the combination of two homogenous classifiers: contextual and structural classifiers. The contextual classifier uses the URL, while the structural classifier uses the document structure. Both contextual and structural classifiers are centroid-based classifiers. Experimentations provide a micro-averaged break- even point (BEP) more than 85%, which is better than those obtained by other categorization approaches.