Building Quality-Based Views of the Web

Authors:
Enrico Triolo;Nicola Polettini;Diego Sona;Paolo Avesani
Affiliations:
Fondazione Bruno Kessler (FBK-IRST), 38050, Trento, Italy;Fondazione Bruno Kessler (FBK-IRST), 38050, Trento, Italy and DIT, University of Trento, 38050, Trento, Italy;Fondazione Bruno Kessler (FBK-IRST), 38050, Trento, Italy;Fondazione Bruno Kessler (FBK-IRST), 38050, Trento, Italy
Venue:
AI*IA '07 Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence on AI*IA 2007: Artificial Intelligence and Human-Oriented Computing
Year:
2007

Citing 9
Cited 0

Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Building Hierarchical Classifiers Using Class Proximity

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The bubble of web visibility

Communications of the ACM - The disappearing computer
Clustering documents into a web directory for bootstrapping a supervised classification

Data & Knowledge Engineering - Special issue: WIDM 2003
Hierarchical Dirichlet model for document classification

ICML '05 Proceedings of the 22nd international conference on Machine learning
Hierarchical classification: combining Bayes with SVM

ICML '06 Proceedings of the 23rd international conference on Machine learning
Hierarchical classification of HTML documents with WebClassII

ECIR'03 Proceedings of the 25th European conference on IR research
Regularization for unsupervised classification on taxonomies

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the fast growing of the information available on the Web, the retrieval of relevant content is increasingly hard. The complexity of the task is concerned both with the semantics of contents and with the filtering of quality-based sources. A recent strategy addressing the overwhelming amount of information is to focus the search on a snapshot of internet, namely a Web view. In this paper, we present a system supporting the creation of a quality-based view of the Web. We give a brief overview of the software and of its functional architecture. More emphasis is on the role of AI in supporting the organization of Web resources in a hierarchical structure of categories. We survey our recent works on document classifiers dealing with a twofold challenge. On one side, the task is to recommend classifications of Web resources when the taxonomy does not provide examples of classification, which usually happens when taxonomies are built from scratch. On the other side, even when taxonomies are populated, classifiers are trained with few examples since usually when a category achieves a certain amount of Web resources the organization policy suggests a refinement of the taxonomy. The paper includes a short description of a couple of case studies where the system has been deployed for real world applications.