It’s been a while since I’ve written, but it isn’t about not having things to talk about. Really, it has been about just finding the time. As a parent, my first priority is my family and my children. At their age, they require a lot of attention, and honestly I think it is important that they get it, because if you don’t give your kids attention, they will find it in other places which you might not approve of.

Anyhow, my day job has certainly kept me busy. For a long time, my focus was largely on Docker and Containers. I’m still considered one of the resident subject matter experts, but I moved into a new organization that is more focused on data. Hadoop is going to be a big part of a lot of what I do in the next few months, but in the mean time I’m helping teams to modernize their infrastructure, development practices, and frameworks. I’ve become the product owner for a platform that will incorporate rules, machine learning, and lots of workflow management.

Machine Learning has become my new passion, and frankly containers just don’t excite me any more. It isn’t that containers don’t have value – in fact, there is definitely some possibility that I’ll be returning to them in the future as a method of distributing computing tasks – it’s just that there isn’t as much to learn there. When I see some new framework around containers, I read about it, but I’ve pretty much absorbed it all within a few minutes. Machine Learning is different. It’s easy to understand on the surface, but when you dive down into the details, things get quite complicated.

Basic machine learning is easy to grasp. One good example is predicting the price of a house based on a few different criteria such as number of rooms, floorspace, location, etc. You could imagine a simple rules based approach based on a table:

This approach certainly can work, but you have to manually adjust the rules as trends change, and it doesn’t quite capture the real correlation between the features (what machine learning folks like to call the inputs) and the resulting price. The prices could actually be the result of a complex combination of the features, something like this:

price = 34 * floorspace + 2700 * numRooms + 818437 * locAvgMonthlyIncome + 7364

I’m just making this up, clearly, but the point is that how certain factors contribute to a price is more than likely something non-trivial. Part of the work of machine learning is to pick a model that can approximate these relationships. Simple options are things like linear regression, and more complex options are things like neural networks. The more complex the option, the more computing power that is required to “train” the model.

The example I provided above is an example of linear regression. Machine Learning folks would write it this way:

y = Theta1 * x1 + Theta2 * x2 + Theta3 * x3 + Theta4

The variables x1, x2, and x3 are the features. Theta1, Theta2, Theta3 and Theta4 are parameters that need to be adjusted to produce a value that is approximately the right value for the given inputs. This is what it means to train a model. So how do you adjust the values? Essentially you build another function that estimates how far off your prediction is. The simple method is to take the difference and square it between the guess and the real value:

error = 0

for every y, actualPrice:

error = error + (y – actualPrice)^2

This gives you an estimate of where you are. This is known as the cost or loss function.

The next thing you do is figure out how to adjust the parameters to reduce the error. One common method is called gradient descent. You compute a partial derivative of the cost function that gives you a graph like this:

The easiest way of thinking of this is to imagine that you are on a mountain top trying to find your way down. You look all around you to find the steepest slope down, and then you take a step. You again look around for the steepest slope down and take a step. As you gradually get closer to the bottom, the slope gets less and less until it hits a point where it starts sloping up. When you hit this point, you have essentially hit a minimum and further adjustment will only increase your error, not decrease it. This is the goal – to find the minimal amount of error given the model.

When you start looking for that minimum error, you have to start somewhere, so you generally pick some values for Theta1, Theta2, etc, which will place you somewhere in the graph. There is some chance that you might hit a local minimum which isn’t the global minimum, so sometimes you have to run the exercise a few times to see if you have hit the real global minimum. Once you determine the optimal values of the parameters, you likely now have a function that will predict the value you are looking for based on the inputs with a reasonable amount of error.

You can’t be 100% sure, however, based on your known data. Sometimes you never really find a great minimum, and the fit against the training data isn’t very good (the error rate is still high). This is often called underfitting or high bias. Chances are that you might need different features or more training data or even a better model with more complex computations. A different problem is when the model fits the training data well but still has high error when used with data not in the training set. This is called overfitting or high variance. You might need less features or a simpler model because your model is just not general enough to make good predictions.

There is a method called regularization that adds some extra tuning in to prevent overfitting by basically adding some extra weight to the cost function so that it doesn’t just follow the training data values strictly.

The real work of machine learning is to try different configurations of a model so that you end up with a nice fit that has medium bias and medium variance and gives you reasonable error for predicting outputs based on inputs. There are a number of great tools out there, and in my next post I’ll talk about TensorFlow, one of the best frameworks for developing sophisticated machine learning applications that is also easy to use.