A proximity measure and a clustering method for concept extraction in an ontology building perspective

Authors:
Guillaume Cleuziou;Sylvie Billot;Stanislas Lew;Lionel Martin;Christel Vrain
Affiliations:
Laboratoire d'Informatique Fondamentale (LIFO), Université d'Orléans, Orléans, France;Laboratoire d'Informatique Fondamentale (LIFO), Université d'Orléans, Orléans, France;Laboratoire d'Informatique Fondamentale (LIFO), Université d'Orléans, Orléans, France;Laboratoire d'Informatique Fondamentale (LIFO), Université d'Orléans, Orléans, France;Laboratoire d'Informatique Fondamentale (LIFO), Université d'Orléans, Orléans, France
Venue:
ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Year:
2006

Citing 2
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Clustering by committee

Clustering by committee

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study the problem of clustering textual units in the framework of helping an expert to build a specialized ontology. This work has been achieved in the context of a French project, called Biotim, handling botany corpora. Building an ontology, either automatically or semi-automatically is a difficult task. We focus on one of the main steps of that process, namely structuring the textual units occurring in the texts into classes, likely to represent concepts of the domain. The approach that we propose relies on the definition of a new non-symmetrical measure for evaluating the semantic proximity between lemma, taking into account the contexts in which they occur in the documents. Moreover, we present a non-supervised classification algorithm designed for the task at hand and that kind of data. The first experiments performed on botanical data have given relevant results.