Machine Learning has been all the rage lately, but some of us were introduced to it decades ago. I first encountered it in my artificial intelligence class at MIT.
When I took the class, it was taught by Patrick Winston, who wrote the book that was considered the definitive guide on artificial intelligence at the time. That book is still used in a lot of places, but I’m sure it has been replaced by other books that have expanded on the ideas and include more discussion about modern frameworks that have made this subject more tangible.
The class was taught in a language called Scheme. For those who don’t know about scheme, it is a dialect of LISP (also known as Lots of Irritating and Silly Parentheses). LISP and its dialects use parentheses to enclose expressions. The syntax is sometimes known as Polish Notation, but LISP is more explicit about the grouping of operators so that programmers don’t have to think so hard about the calculation stack. For example, this is an expression written in Polish Notation:
− × ÷ 15 − 7 + 1 1 3 + 2 + 1 1
These expressions are generally evaluated from right to left, and the interpreter that interprets this statement uses a stack. So the first thing it does is push 1 onto the stack, followed by pushing another 1. Then it encounters the plus, so it pops both 1’s off the stack and adds them, pushing 2 onto the stack. It translates to this:
push 1 onto stack  push 1 onto stack [1 1] pop two operands (1, 1) and add  push 2 onto stack  push 2 onto stack [2 2] pop two operands (2, 2) and add  push 4 onto stack  push 3 onto stack [3 4] push 1 onto stack [1 3 4] push 1 onto stack [1 1 3 4] pop two operands (1, 1) and add [3 4] push 2 onto stack [2 3 4] push 7 onto stack [7 2 3 4] pop two operands (7, 2) and subtract [3 4] push 5 onto stack [5 3 4] push 15 onto stack [15 5 3 4] pop two operands (15, 5) and divide [3 4] push 3 onto stack [3 3 4] pop two operands (3, 3) and multiply  push 9 onto stack [9 4] pop two operands (9, 4) and subtract  Result is 5
Developers used to write expressions like this all the time, and it was slow and cumbersome, and so LISP added the parentheses to make it clearer what was happening. The above expression could instead be written like this:
(- (× (÷ 15 (- 7 (+ 1 1))) 3) (+ 1 1 2))
At first glance, this may not seem much better, but you can easily see how things get reduced:
(- (× (÷ 15 (- 7 (+ 1 1))) 3) (+ 1 1 2)) (- (× (÷ 15 (- 7 2)) 3) (+ 1 1 2)) (- (× (÷ 15 (- 7 2)) 3) 4) (- (× (÷ 15 5) 3) 4) (- (× 3 3) 4) (- 9 4) 5
Because of this, it was much easier to write and understand what a LISP program was doing versus the old stack based expression.
The above example demonstrates numerical computations in LISP, but it also has functions as well. Data structures in Scheme are made up of lists. For example, I can define a variable x to be the list of the first five integers:
(define x (1 2 3 4 5))
Accessing the data inside this structure uses two main keywords, car and cdr. These functions return the first element of a list, and the rest of the list, respectively. This means:
(car x) -> 1 (cdr x) -> (2 3 4 5)
You can probably see that accessing certain elements of the list can get kind of tricky. For example, to get the 4th element of the list (4), you need to do this:
(car (cdr (cdr (cdr x)))) -> 4
The expression works in the reverse order of how it reads from left to right – you have to remove 1, 2, and 3 using cdr and then use car to get the first element of the list (4 5).
Suffice to say that this language comes from a time when programming languages were much more primitive. There was, however, some concepts in the Scheme language that are just now reemerging into programming languages. One such example of this is called a lambda expression. These expressions were introduced to Java in the 1.8 JDK, but they actually have been around as a programming construct for more than 30 years. A lambda expression is essentially an expression that doesn’t evaluate immediately but instead requires values for placeholders to be provided in order to be evaluated. For example, I can define a lambda expression in scheme and set it to a variable x:
(define x (lambda (x y) (+ x y)))
This essentially defines an expression that adds its inputs together. It is kind of like defining a function except that it actually gets stored in a variable and can be passed around (a pointer to a function is kind of a good analogy, but lambda expressions are more like data that can be stored on the stack as opposed to just pointing to some instruction in the program data space). You can then evaluate the lambda expression like this:
(x 1 2) -> 3
So why would anyone do this? A good example is the factory pattern. Say I want to build a custom expression that multiplies one argument by a multiplier that is a parameter to the factory. For example:
(define factory (lambda (multiplier) (lambda (x) (× x multiplier))))
The factory itself is a lambda expression that returns another lambda expression. I could make a triple multiplier and use it like this:
(define tripler (factory 3)) -> (lambda (x) (× x 3)) (tripler 2) -> 6
Or imagine an even more complex example where the factory creates an expression that does any operation with a parameter:
(define factory2 (lambda (operator value) (lambda (x) (operator x value)))) (define divideBySix (factory ÷ 6)) -> (lambda (x) (÷ x 6)) (divideBySix 18) -> 3
This is a simple example, but you can imagine building expressions that efficiently compute expressions based on configuration parameters. In the machine learning world, we are often doing complex computations with parameters that require tuning, so we needed a way to be able to define the computations with placeholders that could be changed as we trained the network to be more accurate.
Python of course allows us to do this in a much more succinct way, but the principles are essentially the same:
def factory2(operator, value): def evaluator(input): return operator(input, value) return evaluator