Fast webpage classification using URL features
Proceedings of the 14th ACM international conference on Information and knowledge management
Purely URL-based topic classification
Proceedings of the 18th international conference on World wide web
Highly efficient algorithms for structural clustering of large websites
Proceedings of the 20th international conference on World wide web
On benchmarking data translation systems for semantic-web ontologies
Proceedings of the 20th ACM international conference on Information and knowledge management
Generating SPARQL executable mappings to integrate ontologies
ER'11 Proceedings of the 30th international conference on Conceptual modeling
A tool for link-based web page classification
CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Towards discovering ontological models from big RDF data
ER'12 Proceedings of the 2012 international conference on Advances in Conceptual Modeling
Towards discovering conceptual models behind web sites
ER'12 Proceedings of the 31st international conference on Conceptual Modeling
CALA: An unsupervised URL-based web page classification system
Knowledge-Based Systems
Hi-index | 0.00 |
Most web page classifiers use features from the page content, which means that it has to be downloaded to be classified. We propose a technique to cluster web pages by means of their URL exclusively. In contrast to other proposals, we analyze features that are outside the page, hence, we do not need to download a page to classify it. Also, it is non-supervised, requiring little intervention from the user. Furthermore, we do not need to crawl extensively a site to build a classifier for that site, but only a small subset of pages. We have performed an experiment over 21 highly visited websites to evaluate the performance of our classifier, obtaining good precision and recall results.