Titanium alloys: modelling of microstructure316
increases (see Fig. 13.6a). Further increase in the number of neurons results
in further increase of the R coefficient for the training dataset that is approaching
the value of 1 (Figs. 13.5 and 13.6). However, the regression coefficient for
the test dataset quickly decreases to average values of <0.8 with large error
bars (Figs. 13.5 and 13.6). This observation shows that there is an effect of
overfitting when the number of neurons is increased to six and above. One
way to prevent the overfitting is by applying Bayesian regularisation in
combination with the Levenberg–Marquardt training (Figs. 13.5 and 13.6).
This is one of the techniques aiming at better generalisation. The results
obtained for the different numbers of neurons in the hidden layer are appreciably
stable (see Fig. 13.6b). Values of coefficient R for both training and test
datasets increase when the number of the neurons is increased to eight.
Further increase of neurons does not result in overfitting and the results for
the regression analyses of both training and test datasets are comparable (see
Fig. 13.5c,d). However, at the same time, increasing the number of neurons
above eight does not give appreciable improvement, while the training time
is significantly increased. Considering all the above, the optimal for this
neural network architecture is eight neurons in the hidden layer.
The type of the training algorithm used is important for both neural network
response and central processing unit (CPU) time (or computational resources)
necessary for training. A summary analysis of the influence of different
training algorithms on the performance of the neural network model is given
below. Different training options can be carried out, including training on
variations of mean square error for better generalisation, training against a
validation set, and training until the gradient of the error reaches a minimum.
Improvement of the generalisation can be achieved by means of regularisation
and early stopping with validation. The general opinion is that automated
regularisation based on Bayesian regularisation in combination with
Levenberg–Marquardt training usually gives the best result, in terms of model
performance and training speed. This combination is used in the neural
networks described in this book unless otherwise stated. However, in some
cases, other training algorithms such as one-step-secant, Polak–Ribiere
conjugate gradient, and variable learning rate may give competitive results.
In the software developed and described in this book, the user can choose the
training algorithm to predict values or re-train the model.
The transfer function transforms the neuron input value into the output
value. The most popular functions are hard limit (hardlim), linear (purelin),
log sigmoid (logsig), and hyperbolic tangent sigmoid (tansig). For all cases
of NN, a linear transfer function is suitable in the output layer. The transfer
functions in the hidden layer are generally s-shaped curves, with the output
value confined within limits of (0,1) or (–1,1), for log-sigmoid and tan-
sigmoid functions, respectively. Mathematical formulations of the main transfer
functions and their general shapes can be found in Chapter 14. In the model