In this blog post I'm going to post something which is rather unrelated to my usual topics. I'm going to talk about how can minor changes in concept when coding, can have dramatic impact on your code performance. I also introduce a technique that I “developed myself” while coding the problem. When I say “developed myself”, I mean that I actually took time to code all the ways people describe the problem’s solution – although all seem very similar, the performance difference is huge.
While working on my GSoC project, I needed a fast predicate to check if a point is inside a polygon (specifically, a triangle). The test doesn't have to be accurate, but it must be sound in the sense that when it states a point is outside, then it must be outside. To do so, I decided to use bounding box testing - see if the point is in the bounding box of the polygon. Computing the bounding box of a group of points should be easy - just keep track of minimal and maximal X and Y and you should be done. So, how much can you screw it up?
Well, apparently, you can screw pretty damn hard. By coding without thinking, you can make this twice as inefficient without even noticing! Thinking before you code, to try and merge cases, can apparently be very productive. My solution will be referred to through this post as the QuickBox solution.
The QuickBox test uses an improved version of a common idea. The usual checks see if in a point, one of it's components is larger/smaller than the matching component in all of the triangle's points. The implementation here takes this one step further, under the assumption we are willing to say that a triangle made of 3 collinear points contains no point at all (not even it's own points). This is fair for most triangulation libraries, that simply don't accept such triangles.
The idea is like this: If all the points of the triangle have the same boolean result for testing order (using <=) against a component of the point (for all i, Pt[i].x <= P.x is the same) then the following cases are possible:
- All have the same value of the component. But then all the points of the triangle are collinear so it's practically empty!
- All triangles points have the component larger than the tested point. So the point is outside!
- All triangles points have the component smaller than the tested point. So the point is outside!
Note that principle explained above is also applicable for polygons with more than 3 vertices.
Let the points of the triangle be A, B and C. Let P be the point we are testing.
This is what people would usually do if they translate directly the description “See if all polygon points are on one side of the input point (check that for each side)”
((Ax <= Px) && (Bx <= Px) && (Cx <= Px)) ||
((Ax >= Px) && (Bx >= Px) && (Cx >= Px)) ||
((Ay <= Py) && (By <= Py) && (Cy <= Py)) ||
((Ay >= Py) && (By >= Py) && (Cy >= Py))
Total Cost: 12 double boolean operations, 11 binary boolean operations
This is what people would usually do if they translate directly the description “Find the the border on each side of the box, see if the point is beyond it”
ABx ACx BCx
xMin := (Ax <= Bx) ? ((Ax <= Cx) ? Ax : Cx)) : ((Bx <= Cx) ? Bx : Cx))
yMin := (Ay <= By) ? ((Ay <= Cy) ? Ay : Cy)) : ((By <= Cy) ? By : Cy))
ABy ACy BCy
// OPTION 1 - Requires computing all of ABx, ACx, BCx, ABy, ACy, BCy
xMax := (! ABx) ? ((! ACx) ? Ax : Cx)) : ((! BCx) ? Bx : Cx))
yMax := (! ABy) ? ((! ACy) ? Ay : Cy)) : ((! BCy) ? By : Cy))
// OPTION 2
xMax := (Ax >= Bx) ? ((Ax >= Cx) ? Ax : Cx)) : ((Bx >= Cx) ? Bx : Cx))
yMax := (Ay >= By) ? ((Ay >= Cy) ? Ay : Cy)) : ((By >= Cy) ? By : Cy))
Compute: Px <= xMin || Px >= xMax || Py <= yMin || Py >= yMax
I'll assume that a trenary op is just one boolean operation and I will also ignore the need to save the values computed in the process such as ABx, etc. And even with that is still costs a lot!
- Option 1: 6+4 double boolean operations, 15 binary boolean operations. Actually, this method is rather inefficient, unless binary boolean operations are much much cheaper than double ones.
- Option 2: 8+4 double boolean operations, 11 binary boolean operations. Exactly the same as the naïve method!
Note: separating the computation to one coordinate a time, can cut memory consumption by half.
QuickBox implementation, This is my method!
This is what people would usually do if they translate directly the description - “See if the point is on the same side of all the polygon-points”
xPBorder := Bx <= Px
yPBorder := By <= Py
Compute: (((Ax <= Px) == xPBorder) && ((C->x <= Px) == xPBorder)) ||
(((Ay <= Py) == yPBorder) && ((C->y <= Py) == yPBorder))
Total Cost: 6 double boolean operations (7 if we can’t save values (not 8! see below)), 7 binary boolean operations.
Note that you can expand this to polygons of much more vertices, using the trick above to avoid repeating computations, while keeping with only 1 variables which is neat :) Work on one coordinate (component each time) and keep an accumulating variable that after it’s initialized with the first comparison, you just compare to it! When done with that coordinate, if indeed you found no proof that it’s outside, pass on to the next coordinate and repeat.
It’s pretty easy to see that the QuickBox method beats the hell out of the competition! Less memory, less computations, and all because we checked “all points are on the same side” and not “all points are on one side” (and checked that for each side). So, when someone tells you how to implement an algorithm, try to rephrase his sentence. It can be critical!
Just some context about this post - I’m implementing a geometric algorithm that was given in 55 lines of pseudo code with ambiguous meanings. After passing 2500 lines of implementation (real C code), and after having to actually add many cases which were dealt with because the algorithm was phrased ambiguously enough with no hint to these cases, I started being very picky about how people describe their algorithms :P I spent 1.5 months of my GSoC on this, which is way more than I expected.
So do us all a favor – if you know how something works, describe it exactly and clearly, so other people can implement it correctly and efficiently. It’s preferable to have 100 lines of code that are readable in 15 minutes, instead of 15 lines of pseudo code which are quickly read but take a 100 minutes to understand! (But don’t over-do it! Bombarding with details will make people like me fall asleep when they read your article…)
[Final note: I don’t have any criticism at the original article author – he did explain it so that people from the field would understand (and eventually, even I understood so it shows he did his job very well). I just came from a different field since this is a side dependency of my project, and not it’s main part… This simply made me aware of how much phrasing of an algorithm can change it’s implementation for people who don’t think exactly like you]