I’m trying to understand how to train a multilayer; however, I’m having some trouble figuring out how to determine a suitable network architecture–i.e., number of nodes/neurons in each layer of the network.
For a specific task, I have four input sources that can each input one of three states. I guess that would mean four input neurons firing either 0, 1 or 2, but as far as I’m told, input should be kept binary?
Furthermore am I having some issues choosing on the amount of neurons in the hidden layer. Any comments would be great.
Thanks.
Determining an acceptable Network structure for a multi-layer perceptron is actually straightforward.
Input Layer: How many features/dimensions are in
your data–ie, how many columns in
each data row. Add one to this (for
the bias node) and that is the
number of nodes for the first (input
layer).
Output Layer: Is your MLP running in ‘machine’
mode or ‘regression’ mode
(‘regression’ used here in the
machine learning rather than the
statistical sense)–ie, does my MLP
return a class label or a predicted
value? If the latter, then your
output layer has a single node. If
the former, then your output layer
has the same number of nodes as
class labels. For instance, if the
result you want is to label each
instance as either “fraud”, or “not
fraud”, that’s two class labels,
therefore, two nodes in your output
layer.
Hidden Layer(s): In between these two (input and
output) are obviously the hidden
layers. Always start with a single
hidden layer. So H\how many nodes? Here’s a rule of thumb: set the (initial) size of the hidden layer to some number of nodes just slightly greater than the number of nodes in the input layer. Compared with having fewer nodes than the input layer, this excess capacity will help your numerical optimization routine (eg, gradient descent) converge.
In sum, begin with three layers for your network architecture; the sizes of the first (input) and last (output are fixed by your data, and by your model design, respectively. A hidden layer just slightly larger than the input layer is nearly always a good design to begin.
So in your case, a suitable network structure to begin would be:
input layer: 5 nodes –> hidden layer: 7 nodes –> output layer: 3 nodes