The standard inner product of two vectors has some nice geometric properties. Given two vectors
This formula, simple as it is, produces a lot of interesting geometry. An important such property, one which is discussed in machine learning circles more than pure math, is that it is a very convenient decision rule.
In particular, say we’re in the Euclidean plane, and we have a line
If you take any vector

Left: the dot product of and is positive, meaning they are on the same side of . Right: The dot product is negative, and they are on opposite sides.
Here is an interactive demonstration of this property. Click the image below to go to the demo, and you can drag the vector arrowheads and see the decision rule change.

decision-rule
The code for this demo is available in a github repository.
It’s always curious, at first, that multiplying and summing produces such geometry. Why should this seemingly trivial arithmetic do anything useful at all?
The core fact that makes it work, however, is that the dot product tells you how one vector projects onto another. When I say “projecting” a vector
In two dimensions this is easy to see, as you can draw the triangle which has
If we call
Another way to think of this is that the projection is
And if
Moreover, if the angle between

Left: the projection points in the same direction as . Right: the projection points in the opposite direction.
And this is precisely why the decision rule works. This 90-degree boundary is the line perpendicular to
More technically said: Let
Theorem: Geometrically,
This theorem is true for any
In fact, the usual formula for the angle between two vectors, i.e. the formula
Part of why this decision rule property is so important is that this is a linear function, and linear functions can be optimized relatively easily. When I say that, I specifically mean that there are many known algorithms for optimizing linear functions, which don’t have obscene runtime or space requirements. This is a big reason why mathematicians and statisticians start the mathematical modeling process with linear functions. They’re inherently simpler.
In fact, there are many techniques in machine learning—a prominent one is the so-called Kernel Trick—that exist solely to take data that is not inherently linear in nature (cannot be fruitfully analyzed by linear methods) and transform it into a dataset that is. Using the Kernel Trick as an example to foreshadow some future posts on Support Vector Machines, the idea is to take data which cannot be separated by a line, and transform it (usually by adding new coordinates) so that it can. Then the decision rule, computed in the larger space, is just a dot product. Irene Papakonstantinou neatly demonstrates this with paper folding and scissors. The tradeoff is that the size of the ambient space increases, and it might increase so much that it makes computation intractable. Luckily, the Kernel Trick avoids this by remembering where the data came from, so that one can take advantage of the smaller space to compute what would be the inner product in the larger space.
Next time we’ll see how this decision rule shows up in an optimization problem: finding the “best” hyperplane that separates an input set of red and blue points into monochromatic regions (provided that is possible). Finding this separator is core subroutine of the Support Vector Machine technique, and therein lie interesting algorithms. After we see the core SVM algorithm, we’ll see how the Kernel Trick fits into the method to allow nonlinear decision boundaries.
Proof of the cosine angle formula
Theorem: The inner product
Note that this angle is computed in the 2-dimensional subspace spanned by
Proof. If either
The law of cosines allows us to write
Moreover, The left hand side is the inner product of
Combining our two offset equations, we can subtract
Which, after dividing by
Now if
Want to respond? Send me an email, post a webmention, or find me elsewhere on the internet.