A neural model for unsupervised taxonomy enrichment

Authors:
Emil Şt. Chifu;Ioan Alfred Leţia
Affiliations:
Technical University of Cluj-Napoca, Cluj-Napoca, Romania;Technical University of Cluj-Napoca, Cluj-Napoca, Romania
Venue:
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Year:
2008

Citing 4
Cited 0

Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures

EKAW '02 Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web
Taxonomy learning: factoring the structure of a taxonomy into a semantic classification decision

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Unsupervised methods for developing taxonomies by combining syntactic and statistical information

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Self organization of a massive document collection

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.01

Visualization

Abstract

The most important prerequisite for the success of the Semantic Web research is the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an extended model of hierarchical self-organizing maps. As being founded on an unsupervised neural network architecture, the framework can be applied to different languages and domains. Terms extracted by mining a text corpus encode contextual content information, in a distributional vector space. The enrichment behaves like a classification of the extracted terms into the existing taxonomy by attaching them as hyponyms for the nodes of the taxonomy. The experiments reported are in the "Lonely Planet" tourism domain. The taxonomy and the corpus are the ones proposed in the PASCAL ontology learning and population challenge. The experimental results prove that the quality of the enrichment is considerably improved by using semantics based vector representations for the classified (newly added) terms, like the document category histograms (DCH) and the document frequency times inverse term frequency (DF-ITF) weighting scheme.