Error correcting a miscalibrated sensor

As you may know from my last post, I used to have a Winsen MH-Z19B CO2 sensor. In fact, I have 4 of them now, since after my little incident I was left with 0. Hopefully I’ll keep these ones alive.

As a refresher, this is a CO2 sensor that uses NDIR (non-dispersive infrared) to detect the concentration of CO2 in the atmosphere. Every 6 seconds, it performs a measurement, and in order to do so it needs a relatively accurate temperature measurement. This is because the measurement method for CO2 used is generating some heat, causing far infrared radiation to be emitted through black-body radiation, which is then bounced around a chamber and measured by an IR sensor, which is much like one in a thermal imaging camera or a motion sensor.

First attempt: Univariate linear functions

The particular sensor I got (and broke) was off in its temperature measurement by about 6 Kelvin. This meant that all its measurements were off by a certain amount (according to my Temtop P1000 air quality measurement station). So naturally, my first idea was to assume (hope?) that the error is a linear factor, maybe with an additional offset, and if I could just compute the function f(x) = mx + b, I could easily correct the measurement it gave me. A line can be described by 2 points, so I would just need 2 measurements and have my function. Unfortunately, my measurements are pretty inaccurate, because CO2 concentration constantly changes, especially when I’m in the room, turning O2 and food into CO2. So the next step was to turn to least-squares optimisation.

Collect a bunch of data points: (CO2 according to my sensor, CO2 according to Temtop reference sensor)
Optimise a linear function to have the least amount of error given all the data points.

The Temtop also has the same sensor in it, so I figured it would be easy to match them up.

Temtop P1000

Of course, storing a bunch of data points as well as the code for computing least squares and also having to run that code on startup sounded like a really bad idea to me and my 8KB of program memory. Fortunately, there’s constexpr, the promise that C++ can be executed at compile time.

Unfortunately, however, since I was using C++11, and a rather old version of AVR-GCC (5.x), I could only really use single return statement constexpr. This resulted in a rather convoluted implementation of least-squares for one variable (see for yourself).

The problem of temperature

The solution worked perfectly. Or so I thought.

Since I had been doing all my measurements in my room at a pleasant 294-295 Kelvin, my corrective function worked perfectly inside that temperature range, but became wildly wrong at lower temperatures like the less pleasant 288 Kelvin that my room would have after opening the window for a bit. Clearly, the error was not linear in one variable - it was at least two! I had to consider temperature.

The sad news here is that in order to do multivariate linear optimisation, you basically need matrix operations like multiplication, determinant, and inverse. Writing that in C++11 constexpr would prove to be a massive pain. Not impossible, mind you, and I have done the kind of crazy template metaprogramming required for this in the distant past, but I’d really prefer to keep that past distant. So I had no choice but to try and upgrade my compiler. It took me a while, but it turned out to be possible to upgrade the AVR-GCC to a more reasonable version like 7.3.0, which nicely supports C++14 constexpr where you can basically run any global-side-effect-free code at compile time (woohoo!).

Multivariate least squares

The final solution, which ended up actually working pretty decently, is based around a small constexpr matrix library I wrote, and an algorithm from Wikipedia. It is now used to correctly return CO2 values that agree with my reference sensor.

Is this solution correct? I don’t know. The temperature sensor inside is probably a thermistor, which has a logarithmic sensitivity curve.

Thermistor Curve

It’s possible that the function seems linear only in the narrow range I measured things in (much like how Earth looks flat (no, I’m not going to hyperlink to flat earthers here) when looking at a narrow slice of its surface), but is actually logarithmic in a wider range. If that’s the case, the problem is pretty easy to solve still, even with the ordinary least-squares method. I’d just make the logarithm of a value one of the variables and compute its weight. In fact, I could do that right now to see whether the function is in fact logarithmic, but for the temperature range that’s relevant to me, the function is linear enough and now I’ll move on to the next thing :).