Statistical modelling of artificial neural networks using the multi-layer perceptron

Authors:
Murray Aitkin;Rob Foxall
Affiliations:
Department of Statistics, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK;Department of Statistics, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK
Venue:
Statistics and Computing
Year:
2003

Citing 0
Cited 3

Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security
Applying effective feature selection techniques with hierarchical mixtures of experts for spam classification

Journal of Computer Security - Best papers of the Sec Track at the 2006 ACM Symposium
An incremental EM-based learning approach for on-line prediction of hospital resource utilization

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-layer perceptrons (MLPs), a common type of artificial neural networks (ANNs), are widely used in computer science and engineering for object recognition, discrimination and classification, and have more recently found use in process monitoring and control. “Training” such networks is not a straightforward optimisation problem, and we examine features of these networks which contribute to the optimisation difficulty.Although the original “perceptron”, developed in the late 1950s (Rosenblatt 1958, Widrow and Hoff 1960), had a binary output from each “node”, this was not compatible with back-propagation and similar training methods for the MLP. Hence the output of each node (and the final network output) was made a differentiable function of the network inputs. We reformulate the MLP model with the original perceptron in mind so that each node in the “hidden layers” can be considered as a latent (that is, unobserved) Bernoulli random variable. This maintains the property of binary output from the nodes, and with an imposed logistic regression of the hidden layer nodes on the inputs, the expected output of our model is identical to the MLP output with a logistic sigmoid activation function (for the case of one hidden layer).We examine the usual MLP objective function—the sum of squares—and show its multi-modal form and the corresponding optimisation difficulty. We also construct the likelihood for the reformulated latent variable model and maximise it by standard finite mixture ML methods using an EM algorithm, which provides stable ML estimates from random starting positions without the need for regularisation or cross-validation. Over-fitting of the number of nodes does not affect this stability. This algorithm is closely related to the EM algorithm of Jordan and Jacobs (1994) for the Mixture of Experts model.We conclude with some general comments on the relation between the MLP and latent variable models.