So I’ve been studying and making a lot of noise about machine learning.
I know that a lot of people aren’t as excited about it as I am, but some people definitely are starting to get excited about it, so I’m glad that I’m getting in on it. I wouldn’t exactly say that I’m getting in on it early, because folks at MIT and other universities and even some companies have been doing machine learning for decades.
What is really creating the excitement these days is the frameworks and resources that are making machine learning available to almost everyone. It used to be that models like neural nets were limited to folks with large clusters of machines that were able to do the massive calculations required to train the model. Now with companies like Amazon and Google offering computing power that you can easily take advantage of, almost anyone can train a complicated model to make predictions. Add to that simple language frameworks like TensorFlow and suddenly it seems like everyone is using machine learning.
While using machine learning has become easy, understanding what it is actually doing is still complicated. My current supervisor and I were talking about machine learning, and he raised a great question: how do you explain what the machine learning model actually does to an executive who has no technical background in machine learning? Particularly, how do you explain what is going on in the hidden layers of a neural network?
Let’s start by talking about how neural networks got their name. Obviously it refers to neurons, but why? Picture your brain:
What the heck is going on in there? Scientists actually don’t entirely know. Studies are still going on to map out different regions of the brain and figure out how it actually all works. We do know, however, that it starts with neurons, which are specialized cells that link together to exchange information:
Most of us have seen this image before, and we understand that a neuron receives signals through its dendrites and then relays more signals through the axon terminals at the end. We know how these neurons carry information to muscles all throughout our bodies.
The real question is how does the brain do image recognition? When you look at a photo, you can easily spot particular things within it. For example, look at the following photo and find at least four children:
How long did that take you? It might take you a few seconds, but it was pretty fast. How would we train a computer to do this task? Obviously we would first teach the computer to recognize faces, and then we would train the computer to differentiate a young face from an old face. The interesting thing about this problem is that we are even able to detect young faces when they are far away and not clear. For example, near the top is a girl in red who is clearly a child. How can we tell? Because she’s sitting on her father’s shoulders. It was actually some other complex information that clued us in to the fact that she was a child. The other thing to note is that you didn’t look at every single face in the photo. In fact, you most likely didn’t even look at faces to begin with – you probably looked for smaller people or smaller heads. You were able to completely ignore most of the faces in the photo because they didn’t meet some criteria, and that allowed your brain to quickly zero in on the children.
How did we get so good at this? Training. When you were born, you didn’t know how to recognize children from a picture. You didn’t even know what a child was. Your brain starts out with some simple abilities (such as breathing), and the rest pretty much has to be figured out. You literally have to train your brain to do almost everything. Just the way that you learned to walk, to ride a bike, and to even speak. It didn’t happen overnight, and in a lot of cases, you had to keep on training and correcting for months in order to be able to do it correctly.
So what is happening when we train our brains? There is a lot of studying going on with this, but the evidence points to the brain forming new connections as things are learned. Your brain is literally rewiring itself continually every day. There is information being stored away, and we aren’t quite sure exactly how. This idea of connections, however, is a big part of how neural networks work.
The most basic examples of a neural network is one that can identify handwritten numbers. There is a data set called the MNIST database of handwritten images that you can freely download. In the database are thousands of images of handwritten numbers. There are many types of machine learning models that are able to be trained to identify the numbers, but neural networks allow you to increase accuracy much higher.
The idea is pretty simple. You take the image and convert it into a series of numbers, each number representing the intensity of a pixel in the image. You feed these numbers into a well trained neural network and out of the network comes a set of numbers, usually 10, of which one will be 1 (or close to 1) and the rest will be 0 (or very close to 0). In between the inputs and the outputs is a set of one or more layers (called hidden layers) that are connected to each other and also to the input and output layers. It basically looks like this:
Notice that each of the inputs is connected to every unit in the hidden layer in the middle, and that each unit in the hidden layer is connected to the output layer. So when you feed the numbers in through the input layer, each of those values gets distributed to every unit in the hidden layer. The units in the hidden layer will combine the values from the various inputs, multiplying each by a distinct weight. The resulting value is then passed to all the units of the next layer. The output layer works the same way, combining the inputs scaled by distinct weights.
How do we determine the weights? Training, of course, but it is a complicated process. It actually requires a pass through the network and back. First we initialize the weights randomly (the reason has to do with the algorithm used to adjust the weights – if you start out with the weights the same, the algorithm breaks down and the model doesn’t actually learn). Then we pass input in and do all the weight calculations until the output is computed. Now we can compare the output to what we expect to see, and we compute some error for each of the outputs. For example, when we first feed the network an image of the number 7, we might get some output data like this:
output 0: 0.485673
output 1: 0.234857
output 2: 0.578578
output 3: 0.756385
output 4: 0.583758
output 5: 0.227621
output 6: 0.978373
output 7: 0.376482
output 8: 0.856756
output 9: 0.037573
Clearly, this isn’t right. We then calculate the difference between these values and the expected values, which is all outputs being 0 except output 7 which should be 1. This difference is fed backwards through the network through a process called back propagation. Essentially we have a way of figuring out how much each unit in the previous layer contributed to the error, and therefore we can make a small adjustment that should reduce the error on the next run. We then continue farther back to the layer before the one we just adjusted and determine again how much each unit in this layer contributed to the error calculated from the layer after it.
Once all the weights have been adjusted, we run another input through the network, calculate error and adjust weights. How long it takes to make one pass depends a lot on how many layers there are and how many units are in each layer. Adding more layers and more units will increase the accuracy of the neural network, but it will also increase the amount of time it takes to run all the calculations through the network, compute the error, and adjust the weights. This is a big reason why neural networks fell out of favor when they were first developed but are now starting to become popular again – with the increased parallelism and computing power available today, really big neural networks are able to be trained in a fraction of the time it would have taken years ago.
It is possible that this is how the brain actually works as well. Input is fed into the brain, and at first we can’t make heads or tails of what that input is, but somehow the brain makes adjustments and things start getting clearer. Eventually the brain becomes so well tuned that we can recognize complicated shapes out of a photograph in a really short period of time. We do know that some things must be trained early on, such as language, because if these things are not done early they become hard to learn later in life. It could be because things like language are paramount to a lot of the other learning that we do. When you look at a photo, you not only see shapes, you see things and your brain finds words to associate with those things so that you know what you are looking at.
I believe that we are a long way off from having an artificial intelligence that can really work the same way our brains do, but it could be that we just need to keep on training our networks and making them bigger and bigger. It takes the average person almost two decades to achieve what we consider maturity, so how can we expect a computer to learn the same things in just hours or days? We are still in the pioneer days of artificial intelligence and machine learning (computers as we know them have only been around for a few decades), but I believe that as time goes on, we’ll get closer and closer to building smart machines that think like we do.
Should we be afraid of terminators trying to kill us some day? Who knows. A computer will ultimately just be calculating some path to a goal, and it might not be programmed to look out for ethical boundaries. Having said that, those boundaries are something we learned along the way as we grew up, so we’ll just have to make sure that we train the machines to understand those rules as well.