My very first production bug

The year was 2013 and I was smack in the middle of my master’s degree. Coding-wise, we were taught Fortran95 and not much else. To get some hands-on experience, I decided to take up a summer internship. I ended up working in some technical consulting company that contracted for the automotive industry.

This firm had a hardware demonstrator that was supposed to show off their autonomous driving algorithms (“fully autonomous driving is just around the corner”). It consisted of some 1:43 toy cars called Kyosho dNaNo racing around a track filmed by a camera. There was a real-time vision system that translated the position of the real cars on the track to some 2D coordinates. It was based on not much more than the bounding-box algorithm of OpenCV plus a Kalman filter.

The new camera

Now, the camera’s framerate of 20Hz wasn’t fast enough to accurately track the position of these cars. So, the supervisor purchased a faster camera. Mysteriously, after increasing the update frequency of the Kalman filter to match the new camera’s framerate, the filter became incredibly sluggish. The cars were racing down the track but on screen they barely moved. That was quite the head-scratcher: How come a better camera makes the system worse?!

This is where I come in. The vision system wasn’t exactly beautifully crafted C++ code. Most of it was C-style anyway, and worse, some parts generated by Matlab. It was clearly a product of a string of summer interns just hacking away at the problem. The latest intern did the camera update and then left.

Debugging numerical algorithms is always a pain, but in such a legacy codebase where all the variables have names like a_gen and k_j1, it was hell. What to do? Well, since C++ doesn’t have built-in matrix types like I was used to from Fortran, I grabbed a matrix library called armadillo and printed some of the matrices used in the Kalman filter algorithm on screen and look at the their values.

The find

At this point, you should know that the Kalman filter algorithm consists of two parts: prediction and update. Turns out, all the ‘update’ quantities and parts of the ‘prediction’ quantities were zero, resulting in the sluggish behavior of the filter.

It took me an embarassingly long time to figure it out, but the solution was simple. One of these predecessor interns (presumably) changed

1
#define FRAMERATE 20.0

to

1
#define FRAMERATE 100

Note the missing .0 at the end which makes the compiler interpret FRAMERATE as an int. Given that the sampling time was computed by

1
float ts = 1 / FRAMERATE;

it turned out that ts was always 0, due to the rules surrounding division of ints. This in turn made the Kalman filter predict the same position over and over again. Hence the car not updating its location on screen. Changing the preprocessor directive to

1
#define FRAMERATE 100.0

solved the problem immediately. Result? The tracking worked much better than with the old camera and I made a good impression in my first week of the internship.

Lessons learned?

  1. Avoid #defines for constants like the plague. Use static constexpr instead.
  2. Every company I worked for after said internship violated lesson 1 at some point.

Follow the discussion on HN.