Toward a taxonomy of concepts using web documents structure

  • Authors:
  • Rim Zarrad;Narjes Doggaz;Ezzeddine Zagrouba

  • Affiliations:
  • University of Tunis El Manar, Tunis, Tunisia;University of Tunis El Manar, Tunis, Tunisia;University of Tunis El Manar, Tunis, Tunisia

  • Venue:
  • Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the rise of the Web and the need to have structured knowledge, an interesting line for research is the formalization of ontologies and the creation of conceptual taxonomies from Web documents. The traditional methods for ontology learning and especially those extracting domain concepts from a textual corpus often privilege the analysis of the text itself, whether they are based on a statistical or linguistic approach. In this paper, we propose an approach which differs from the traditional ones since it uses information on the document structure to extract relevant information. Our approach studies each material form in the text in order to extract the most relevant concepts constituting the ontology related to a given field. The concepts are obtained by analyzing the occurrences of the candidate terms in the titles and in the links belonging to the documents and by considering the used styles. Our approach has been experimented on a french corpus of Web documents related to the medical field. Primary results are encouraging and seem to validate our approach. We present also in this paper a new method for the extraction of the hierarchical links between the concepts of the ontology. The taxonomic links are established in three phases: a linguistic step is based on the canonical syntactic structure of the extracted concepts, the second step consists in applying lexico-syntactic patterns which convey the hyperonymy relation and the third step analyzes the hierarchy of the titles in each document to extract taxonomic relations.