Digital web library of a website with document clustering

  • Authors:
  • Isabel Mahecha-Nieto;Elizabeth León

  • Affiliations:
  • Universidad Nacional de Colombia, Departamento de Ingeniería de Sistemas e Industrial, Bogotá, Colombia;Universidad Nacional de Colombia, Departamento de Ingeniería de Sistemas e Industrial, Bogotá, Colombia

  • Venue:
  • IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digital libraries allow organizing, classifying and publishing collections of electronic contents that are available in computers or networks. Also, digital libraries are easy to use and configure and they offer a user interface with access to fast searching and browsing over a repository of documents using a graphical interface. This article presents a digital library prototype for retrieving, indexing and clustering documents published on a website. The website may include unstructured, semi-structured and structured documents such as: web pages, scientific papers, news and documents in several formats that contain essentially text. The proposed prototype includes a clustering process that uses a conceptual algorithm and an a priori process of cluster labeling. Preliminary results correspond to tests made with different sets of documents published in a real website.