Toward a taxonomy of concepts using web documents structure

Authors:
Rim Zarrad;Narjes Doggaz;Ezzeddine Zagrouba
Affiliations:
University of Tunis El Manar, Tunis, Tunisia;University of Tunis El Manar, Tunis, Tunisia;University of Tunis El Manar, Tunis, Tunisia
Venue:
Proceedings of the 14th International Conference on Information Integration and Web-based Applications & Services
Year:
2012

Citing 4
Cited 0

Using text processing techniques to automatically enrich a domain ontology

Proceedings of the international conference on Formal Ontology in Information Systems - Volume 2001
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Automatic acquisition of hyponyms from large text corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Ontology learning from domain specific web documents

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to the rise of the Web and the need to have structured knowledge, an interesting line for research is the formalization of ontologies and the creation of conceptual taxonomies from Web documents. The traditional methods for ontology learning and especially those extracting domain concepts from a textual corpus often privilege the analysis of the text itself, whether they are based on a statistical or linguistic approach. In this paper, we propose an approach which differs from the traditional ones since it uses information on the document structure to extract relevant information. Our approach studies each material form in the text in order to extract the most relevant concepts constituting the ontology related to a given field. The concepts are obtained by analyzing the occurrences of the candidate terms in the titles and in the links belonging to the documents and by considering the used styles. Our approach has been experimented on a french corpus of Web documents related to the medical field. Primary results are encouraging and seem to validate our approach. We present also in this paper a new method for the extraction of the hierarchical links between the concepts of the ontology. The taxonomic links are established in three phases: a linguistic step is based on the canonical syntactic structure of the extracted concepts, the second step consists in applying lexico-syntactic patterns which convey the hyperonymy relation and the third step analyzes the hierarchy of the titles in each document to extract taxonomic relations.