Read The Age of Spiritual Machines: When Computers Exceed Human Intelligence Online
Authors: Ray Kurzweil
Tags: #Non-Fiction, #Fringe Science, #Amazon.com, #Retail, #Science
Once upon a time two daughter sciences were born to the new science of cybernetics. One sister was natural, with features inherited from the study of the brain, from the way nature does things. The other was artificial, related from the beginning to the use of computers. Each of the sister sciences tried to build models of intelligence, but from very different materials. The natural sister built models (called neural networks) out of mathematically purified neurones. The artificial sister built her models out of computer programs.In their first bloom of youth the two were equally successful and equally pursued by suitors from other fields of knowledge. They got on very well together. Their relationship changed in the early sixties when a new monarch appeared, one with the largest coffers ever seen in the kingdom of the sciences: Lord DARPA, the Defense Department’s Advanced Research Projects Agency. The artificial sister grew jealous and was determined to keep for herself the access to Lord DARPA’s research funds. The natural sister would have to be slain.The bloody work was attempted by two staunch followers of the artificial sister, Marvin Minsky and Seymour Papert, cast in the role of the huntsman sent to slay Snow White and bring back her heart as proof of the deed. Their weapon was not the dagger but the mightier pen, from which came a book—
Perceptrons
—purporting to prove that neural nets could never fill their promise of building models of mind:
only computer programs could do this.
Victory seemed assured for the artificial sister. And indeed, for the next decade all the rewards of the kingdom came to her progeny, of which the family of expert systems did best in fame and fortune.But Snow White was not dead. What Minsky and Papert had shown the world as proof was not the heart of the princess, it was the heart of a pig.
MATH LESS “PSEUDO CODE” FOR THE NEURAL NET ALGORITHMHere is the basic schema for a neural net algorithm. Many variations are possible, and the designer of the system needs to provide certain critical parameters and methods, detailed below.The Neural Net AlgorithmCreating a neural net solution to a problem involves the following steps:• Define the input.• Define the topology of the neural net (i.e., the layers of neurons and the connections between the neurons).• Train the neural net on examples of the problem.• Run the trained neural net to solve new examples of the problem.• Take your neural net company public.These steps (except for the last one) are detailed below:The Problem InputThe problem input to the neural net consists of a series of numbers. This input can be:• in a visual pattern-recognition system: a two-dimensional array of numbers representing the pixels of an image; or• in an auditory (e.g., speech) recognition system: a two-dimensional array of numbers representing a sound, in which the first dimension represents parameters of the sound (e.g., frequency components) and the second dimension represents different points in time; or• in an arbitrary pattern recognition system: an
n
-dimensional array of numbers representing the input pattern.Defining the TopologyTo set up the neural net:The architecture of each neuron consists of:• Multiple inputs in which each input is “connected” to either the output of another neuron or one of the input numbers.• Generally, a single output, which is connected either to the input of another neuron (which is usually in a higher layer) or to the final output.Set up the first layer of neurons:• Create No neurons in the first layer. For each of these neurons, “connect” each of the multiple inputs of the neuron to “points” (i.e., numbers) in the problem input. These connections can be determined randomly or using an evolutionary algorithm (see below).• Assign an initial “synaptic strength” to each connection created. These weights can start out all the same, can be assigned randomly, or can be determined in another way (see below).Set up the additional layers of neurons:Set up a total of M layers of neurons. For each layer, set up the neurons in that layer. For layer
i
:• Create N
i
neurons in layer
i
. For each of these neurons, “connect” each of the multiple inputs of the neuron to the outputs of the neurons in layer
i-1
(see variations below).• Assign an initial “synaptic strength” to each connection created. These weights can start out all the same, can be assigned randomly,or can be determined in another way (see below).• The outputs of the neurons in layer
M
are the outputs of the neural net (see variations below).The Recognition TrialsHow each neuron works:Once the neuron is set up, it does the following for each recognition trial.• Each weighted input to the neuron is computed by multiplying the output of the other neuron (or initial input) that the input to this neuron is connected to by the synaptic strength of that connection.• All of these weighted inputs to the neuron are summed.• If this sum is greater than the firing threshold of this neuron, then this neuron is considered to “fire” and its output is 1. Otherwise, its output is 0 (see variatioris below).Do the following for each recognition trial:For each layer, from layer
0
to layer
M
:And for each neuron in each layer:• Sum its weighted inputs (each weighted input = the output of the other neuron [or initial input] that the input to this neuron is connected to, multiplied by the synaptic strength of that connection).• If this sum of weighted inputs is greater than the firing threshold for this neuron, set the output of this neuron = 1, otherwise set it to 0.To Train the Neural Net• Run repeated recognition trials on sample problems.• After each trial, adjust the synaptic strengths of all the interneuronal connections to improve the performance of the neural net on this trial (see the discussion below on how to do this).• Continue this training until the accuracy rate of neural net is no longer improving (i.e., reaches an asymptote).Key Design DecisionsIn the simple schema above, the designer of this neural net algorithm needs to determine at the outset:• What the input numbers represent.• The number of layers of neurons.• The number of neurons in each layer (each layer does not necessarily need to have the same number of neurons).• The number of inputs to each neuron, in each layer. The number of inputs (i.e., interneuronal connections) can also vary from neuron to neuron, and from layer to layer.• The actual “wiring” (i.e., the connections). For each neuron, in each layer, this consists of a list of other neurons, the outputs of which constitute the inputs to this neuron. This represents a key design area. There are a number of possible ways to do this:(i) wire the neural net randomly; or(ii) use an evolutionary algorithm (see next section of this Appendix) to determine an optimal wiring; or(iii) use the system designer’s best judgment in determining the wiring.• The initial synaptic strengths (i.e., weights) of each connection. There are a number of possible ways to do this:(i) set the synaptic strengths to the same value; or(ii) set the synaptic strengths to different random values; or(iii) use an evolutionary algorithm to determine an optimal set of initial values; or(iv) use the system designer’s best judgment in determining the initial values.• The firing threshold of each neuron.• Determine the output. The output can be:(i) the outputs of layer
M
of neurons; or(ii) the output of a single output neuron, whose inputs are the outputs of the neurons in layer
M
;(iii) a function of (e.g., a sum of) the outputs of the neurons in layer
M
; or(iv) another function of neuron outputs in multiple layers.• Determine how the synaptic strengths of all the connections are adjusted during the training of this neural net. This is a key design decision and the subject of a great deal of neural net research and discussion. There are a number of possible ways to do this:(i) For each recognition trial, increment or decrement each synaptic strength by a (generally small) fixed amount so that the neural net’s output more closely matches the correct answer. One way to do this is to try both incrementing and decrementing and see which has the more desirable effect. This can be time consuming, so other methods exist for making local decisions on whether to increment or decrement each synaptic strength.(ii) Other statistical methods exist for modifying the synaptic strengths after each recognition trial so that the performance of the neural net on that trial more closely matches the correct answer.Note that neural net training will work even if the answers to the training trials are not all correct. This allows using real-world training data that may have an inherent error rate. One key to the success of a neural net-based recognition system is the amount of data used for training. Usually a very substantial amount is needed to obtain satisfactory results. Just like human students, the amount of time that a neural net spends learning its lessons is a key factor in its performance.VariationsMany variations of the above are feasible. Some variations include:• There are different ways of determining the topology, as described above. In particular, the interneuronal wiring can be set either randomly or using an evolutionary algorithm.• There are different ways of setting the initial synaptic strengths, as described above.• The inputs to the neurons in layer; do not necessarily need to come from the outputs of the neurons in layer
i-1
.Alternatively, the inputs to the neurons in each layer can come from any lower layer or any layer.• There are different ways to determine the final output, as described above.• For each neuron, the method described above compares the sum of the weighted inputs to the threshold for that neuron. If the threshold is exceeded, the neuron fires and its output is 1. Otherwise, its output is 0. This “all or nothing” firing is called a nonlinearity. There are other nonlinear functions that can be used. Commonly a function is used that goes from 0 to 1 in a rapid but more gradual fashion (than all or nothing). Also, the outputs can be numbers other than 0 and 1 .• The different methods for adjusting the synaptic strengths during training, briefly described above, represent a key design decision.• The above schema describes a “synchronous” neural net, in which each recognition trial proceeds by computing the outputs of each layer, starting with layer
0
through layer
M
. In a true parallel system, in which each neuron is operating independently of the others, the neurons can operate asynchronously (i.e., independently). In an asynchronous approach, each neuron is constantly scanning its inputs and fires (i.e., changes its output from 0 to 1) whenever the sum of its weighted inputs exceeds its threshold (or, alternatively, using another nonlinear output function).Happy Adaptation!