Stochastic modelling of scientific terms distribution in publications

Authors:
Rimantas Rudzkis;Vaidas Balys;Michiel Hazewinkel
Affiliations:
Institute of Mathematics and Informatics, Vilnius, Lithuania;Institute of Mathematics and Informatics, Vilnius, Lithuania;Centrum voor Wiskunde en Informatica, Amsterdam, The Netherlands
Venue:
MKM'06 Proceedings of the 5th international conference on Mathematical Knowledge Management
Year:
2006

Citing 11
Cited 1

Word association norms, mutual information, and lexicography

Computational Linguistics
Expert network: effective and efficient learning from human decisions in text categorization and retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
A Linear Least Squares Fit mapping method for information retrieval from natural language texts

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Word-sense disambiguation using statistical models of Roget's categories trained on large corpora

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Measures of distributional similarity

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Parsing a natural language using mutual information statistics

AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 2
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Statistical Classification of Scientific Publications

Informatica

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem of automatic keywords assignment to scientific publications. The idea to use textual traces learned from training data in a supervised manner to identify appropriate keywords is considered. We introduce the transparent concept of identification cloud as a means to represent the semantics of scientific terms. This concept is mathematically defined by models of scientific terms stochastic distributions over publication texts. Characteristics of models as well as procedures for both non-parametric and parametric estimation of probability distributions are presented.