Stochastic modelling of scientific terms distribution in publications

  • Authors:
  • Rimantas Rudzkis;Vaidas Balys;Michiel Hazewinkel

  • Affiliations:
  • Institute of Mathematics and Informatics, Vilnius, Lithuania;Institute of Mathematics and Informatics, Vilnius, Lithuania;Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands

  • Venue:
  • MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we address the problem of automatic keywords assignment to scientific publications. The idea to use textual traces learned from training data in a supervised manner to identify appropriate keywords is considered. We introduce the transparent concept of identification cloud as a means to represent the semantics of scientific terms. This concept is mathematically defined by models of scientific terms stochastic distributions over publication texts. Characteristics of models as well as procedures for both non-parametric and parametric estimation of probability distributions are presented.