Connectionist Networks[Based on class handout of 9/10/98.]
(1) 2 Inputs, 1 Output
(2) Input Output
(3) 2 Inputs, 1 Output
(4) Input Output
(5) Input Output
Questions about Neural Networks that should not be confused with one another...


Sigmoid Activation Function

Network Weights After 20,000 sweeps of training

- Output activations
- using and.20000.wts and and.data (Training Set)
- 0.001
- 0.081
- 0.083
- 0.903
The goal of the network is to have the output to be "on" only when both of the inputs are "on". The sigmoid activation function has the effect that the output node is "on" when its net input is above zero, and "off" when its net input is below zero. The bias node has a strong negative weight &emdash; positive input from either one of the inputs is not enough to override the effect of the bias and bring the net input above zero. Only when both inputs are sending activation to the output node is the effect of the bias overcome.

In order for the network to model the XOR function, we need activation of either of the inputs to turn the output node "on" &emdash; just as in the OR network. This was achieved easily by making the negative weight on the bias be smaller in magnitude than the positive weight on either of the inputs. However, in the XOR network we also want the effect of turning both inputs on to be to turn the output node "off". Since turning both nodes on can only increase the total input to the output node, and the output is switched "off" when it receives less input, this effect cannot be achieved.


- NETWORK CONFIGURED BY TLEARN
- # weights after 5000 sweeps
- # WEIGHTS
- # TO NODE 1
- -3.0456776619 ## bias to 1
- 5.5165352821 ## i1 to 1
- -5.7562727928 ## i2 to 1
- 0.0000000000 ## only 1 0 turns node 1 "on"
- 0.0000000000
- 0.0000000000
- # TO NODE 2
- -3.6789164543 ## bias to 2
- -6.4448370934 ## i1 to 2
- 6.4957633018 ## i2 to 2
- 0.0000000000 ## only 0 1 turns node 2 "on"
- 0.0000000000
- 0.0000000000
- # TO NODE 3
- -4.4429202080 ## bias to output
- 0.0000000000
- 0.0000000000
- 9.0652370453 ## 1 to output
- 8.9045801163 ## 2 to output
- 0.0000000000 ## these connections like OR network
- ## except 1 & 2 are never both "on"
Output activations
- using xor.5000.wts and xor.data (Training Set)
- 0.022
- 0.980
- 0.981
- 0.020
The fact that there is a network configuration which deals quite well with XOR does not guarantee that a network starting with random weights and using a simple learning algorithm will be able to find this configuration.
Random Seed
The random seed is of little importance, except as a tool in helping you use the simulator.
Except where explicitly specified, the networks always begin training with all weights between nodes set randomly. Although the network should hopefully solve the learning problem from any random starting point, different starting points are liable to lead to slightly different solutions and slightly different courses of development. The seed is a number that is used to generate random numbers; when the same random seed is used, the same random set of initial weights is used. By allowing you to control the random initial settings in this way, the tlearn program makes it easier to replicate earlier demonstrations by yourself and others.
Learning Rate
The learning rate, which is explained in chapter 1 (pp. 12-13), is a training parameter which basically determines how strongly the network responds to an error signal at each training cycle. The higher the learning rate, the bigger the change the network will make in response to a large error. Sometimes having a high learning rate will be beneficial, at other times it can be quite disastrous for the network. You can see an example of sensitivity to learning rate in the case of the XOR network discussed in chapter 4.
Why should it be a bad thing to make big corrections in response to big errors? The reason for this is that the network is looking for the best general solution to mapping all of the input-output pairs, but the network normally adjusts weights in response to an individual input-output pair. Since the network has no knowledge of how representative any individual input-output pair is of the general trend in the training set, it would be rash for the network to respond too strongly to any individual error signal. By making many small responses to the error signals, the network learns a bit more slowly, but it is protected against being messed up by outliers in the data.
Momentum
Just as with learning rate, sometimes the learning algorithm can only find a good solution to a problem if the momentum training parameter is set to a specific value. What does this mean, and why should it make a difference?
If momentum is set to a high value, then the weight changes made by the network are very similar from one cycle to the next. If momentum is set to a low value, then the weight changes made by the network can be very different on adjacent cycles. So what?
The problem that the network faces is that it is searching for the best available configuration to model the training data, but it has no knowledge of what the best solution is, or even whether there is a particularly good solution at all. It therefore needs some efficient and reliable way of searching the range of possible weight-configurations for the best available solution.
One thing that can be done is for the network to test whether any small changes to its current weight-configuration lead to improved performance. If so, then it can make that change. Then it can ask the same question in its new weight-configuration, and again modify the weights if there is a small change that leads to improvement. This is a fairly effective way for a blind search to proceed, but it has inherent dangers &emdash; the network might come across a weight-configuration which is better than all very similar configurations, but is not the best configuration of all. In this situation, the network can figure out that no small changes improve performance, and will therefore not modify its weights. It therefore thinks that it has reached an optimal solution, but this is an incorrect conclusion. This problem is known as a local maximum or local minimum.
Momentum can serve to help the network avoid local maxima, by controlling the scale at which the search for a solution proceeds. If momentum is set high, then changes in the weight-configuration are very similar from one cycle to the next. A consequence of this is that early in training, when error levels are typically high, weight changes will be consistently large. Because weight changes are forced to be large, this can help the network avoid getting trapped in a local maximum.