Describing Machine Learning With Algebra

I volunteered for a brown bag session recently, and of course I chose Machine Learning to be my topic. Machine Learning, however, is not an easy topic. Sure, it is just machines learning, right?

Yes, but there is more to it than that. You understand that it is about learning, but how exactly do you make a machine learn? I talked previously about neural networks and that gave some insight into it, but I didn’t really go into what is actually happening during the steps of learning. Some of it is complicated math, and I might go into that at some point, but I think there is some simple math that can help clear this up a little. To do this, we need to go back to algebra.

algebra

As we all recall, algebra is all about solving for xMany of us took our first look at an equation and said, “what the hell is this?”. As an aside, if you are looking to teach algebra to your little ones, you should definitely check out the DragonBox Algebra game. It does an incredible job of taking a hard concept like algebra and turning it into a fun game that helps kids do algebra without even realizing they are doing math!

Let’s take a step back though, and imagine that you are trying to connect two points on a plane:

points

The idea is that there is some relationship between these data points that you want to capture so that you can predict other possible data points that are similar. In the case of two points, the best fit is a line:

line

Mathematically, there is an equation that defines this line that fits these points:

CodeCogsEqn

Now imagine we have three points:

threepoints

It is pretty clear that our line isn’t going to properly fit these points. But a second-order equation (a parabola) like this might:

CodeCogsEqn (1)

This might look something like this (forgive my rough sketch):

parabola

More points might require even higher-order equations. So the first step of the problem is to identify the kind of equation that can represent the solution. The second problem is to solve for the values of A, B and C in our above equation that give rise to the parabola that we see above. For supervised machine learning, where we are essentially trying to fit some data in order to make accurate predictions, this is essentially trying to solve these two problems – find a model that has the ability to fit the data, and then doing the work to find the parameters that actually allow it to do so.

The details on how we find the values of A, B, and C are a little more complicated than what I want to go into today, but it essentially comes down to making some initial guesses at the values, calculating how far off our guess was, and then slowly adjusting their values in a way that brings the error level down. The most common method of doing this is called Gradient Descent, and I described that in another article, but there are plenty of materials online that describe the method. The short answer is that there is a way to calculate the direction to change each of the variables you are trying to calculate, and you do it in small steps so you don’t overshoot the optimal values. That is what machine learning training is all about.

I wouldn’t call it rocket science, but it definitely involves some math. Fortunately there are frameworks out there that are doing most of that math for you so that you can just focus on picking the right model and feeding it the data. Definitely take a look at my article on TensorFlow and Machine Learning Basics for an example of how to do this.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s