In the past 2 months I’ve been busy building a small CNC router, which I’ll write about some other time. Today’s topic is about how, once you have a CNC machine, or even just a bunch of steppers and a GRBL controller, you can use those things to make music. Obviously, the sound quality of your CNC router is the most important aspect (see the end for why).

The compiler pipeline

I’ve built this code as a pipeline of many small stages, because figuring out each step was a bunch of work that I wanted to individually debug and observe. The stages are as follows:

  1. Parse binary MIDI file.
  2. Turn relative time offsets into absolute time stamps.
  3. Combine the MIDI events from all tracks and sort them by absolute time.
  4. Turn relevant MIDI events into a homogeneous intermediate representation.
  5. Compute concurrently active notes along with their durations.
  6. For each active note list:
    1. Compute sound frequency for the notes.
    2. Compute feed rate (now the stepper motor settings matter).
    3. Compute per-axis relative movement for each note.
    4. Compute total feed rate for all axes together.
  7. Compute absolute positioning (now machine dimensions matter), ensuring that we don’t overshoot the safe machine boundaries.
  8. Generate G-Code.

Parse MIDI

This step is fairly straightforward. I used a concise but almost-complete and well-written spec along with some MIDI files I downloaded. I only implemented the things I actually found in the MIDI files (which ended up being most of the spec).

There are some interesting things to point out, though.

14 bit numbers

Pitch Bend quantity is a 14 bit number distributed over 2 bytes. My guess is that it makes it easier to deal with it when all you have are signed chars (i.e. 7 bit positive maximum). It’s still a bit odd, because there are perfectly normal 16 and 24 bit quantities all over the place. Pitch Bend isn’t even signed, so I don’t know what they were smoking when writing the MIDI spec.

Utterly useless optimisation

Something missing (unless I missed it?) from this spec is a thing called “running status”. The “status” is what MIDI calls the event type, which is a roughly 3 bit number (0x8 - 0xf) at the beginning of each event, usually followed by a 4 bit number that is the MIDI channel (so 1 byte in total). That’s all pretty standard for binary protocols. Now, the fun part is that the first bit of that status byte indicates whether it’s a status byte at all. If not, then you are to use the previous status.

Sounds good, right? Saves you one whole byte, which is definitely important when that status byte makes up about 1/4th of the entire MIDI event. Or so you would think! The reality is that most actual music is going to be a series of events like:

  • 0.0s: Note 52 ON
  • 0.1s: Note 52 OFF
  • 0.1s: Note 54 ON
  • 0.2s: Note 54 OFF
  • etc.

Real world midi files (or data streams) are going to for the most part have lots of ON/OFF events directly following each other, making the number of times you can actually do this optimisation pretty small (mostly chords, which are relatively rare). Sadly, it makes the code you have to write for it so much uglier and more complicated, because without this, you could just parse each event independently in a loop. Now, you need to keep state of that previous status during the loop. I’m sure someone thought this was a good idea, but the reality is that I haven’t actually found this be used inside the MIDI events. The only reason I implemented it is because one of the files I wanted to process used it in the Copyright or Song Title events for some weird reason.

Relative to absolute

MIDI is primarily a streaming protocol, so rather than having the device keep an accurate high-res clock, the stream contains time deltas so all it needs to have is an accurate timer (much lower bit width for the same accuracy). For our further processing, and to be able to combine tracks into a single stream, we need to know the absolute time offset from the start.

Combine and sort

We’re not actually interested in MIDI tracks at all. We’re potentially interested in channels, if we want to filter them out (e.g. drums don’t translate well to stepper motors), but channels are written inside the events. Tracks also just kinda exist for some reason. In MIDI format 1 (see spec for other formats), the first track contains all the tempo changes (why?!), and the other tracks contain only notes and no tempo changes (why?!).

So for further processing, we drop the MIDI header (except we keep track of the time division constant which we’ll need later to compute real world time in seconds from the abstract time offsets) and the fact that tracks exist and just keep a single long list of events, ordered by absolute time.

Create IR 1

There are a whole bunch of different MIDI events, some of them are “meta” events (like tempo change) which affect all channels, and some are “channel” events (like note on/off or pitch bend). There’s also System Exclusive events, but I have no idea what those are and I don’t need them. Ultimately I want a stream of commands, so our intermediate representation is going to be a single homogeneous type called Note with the following attributes:

  • Absolute time (time divisions)
  • On/off (boolean)
  • Note number (8 bit integer)
  • Velocity (8 bit integer)
  • Current tempo (24 bit integer)
  • Channel (4 bit integer)
  • Pitch bend (14 bit integer)

This gets rid of tempo events (each Note knows the current tempo) and pitch bends, leaving us with a sequence of simple events.

Pitch bends

Pitch bends are interesting enough to warrant a separate mention. Each channel has its own “pitch bend wheel” which maintains a state that can be updated anytime. E.g. you could have no notes sounding and spin the wheel up and down and of course you would observe nothing. Still, we need to keep track of the wheel state for each note. What I ended up doing is: I keep track of the wheels for each channel, and whenever a note is sounded, I record the current wheel state in that note. If the note is turning OFF, I record the wheel state as well, but really just because I don’t like seeing random wheel states in my IR. It doesn’t affect the code. Whenever the wheel state is modified, I turn the currently active note OFF, then immediately turn it back ON with the updated pitch bend. This way, each note individually knows its own bend.

Note that this logic currently only works with single notes (per channel) doing bends. If you’re bending chords, the code becomes a bit more complicated because now you need to track active notes (which we need to do anyway, later, but not yet in this phase).

Active notes (IR 2)

Here, we simply iterate over the events, record which notes are currently on, and whenever time advances, we emit an “active note” event that contains all active notes and their duration. Some notes may be active for longer, which just means that in the next “active notes” event, that note will also be in the list of active notes.


Absolute Time Event Active Notes Event emitted
0s ON 52 52  
0s ON 54 52, 54  
3s OFF 52 54 3s, [52, 54]
4s OFF 54   1s, [54]

We also compute the real world clock time from the ticks-based quantity in the MIDI events and turn it back into relative times, i.e. duration in seconds.

Relative positioning

This is broken into:

  1. Compute frequency, which is based on note 69 being the 440hz pitch standard: 2^(note-69+bend) / 12 * 440. By the way, pitch bend goes from 0 to 16383 (being a 14 bit unsigned int), and 8192 is considered “no bend”, making it actually sort of signed except the sign bit is the wrong way around. How much bend a device actually performs is up to it, but a standard seems to be 2 notes, so I divide the number by 4096 to get the bend value in this formula.
  2. Compute per-axis feed rate: freq * 60 / ppu. Feed rate is in mm/min, so the frequency is multiplied by 60, and then divided by pulses per unit (i.e. per mm) resulting in the feed rate needed for that exact sound frequency (= pulses per second).
  3. Compute relative movement based on feed rate and duration (the machine doesn’t know about moving for a certain amount of time, it wants positions).
  4. Compute combined feed rate.

Combining feed rates is an interesting one worth pointing out more. Thus far, we know for each individual axis what feed rate we need and where we need to go. However, we need to move the machine simultaneously in all directions but only have a single feed rate setting for a movement instruction. Turns out, high school math is useful: all we need is the Euclidian norm sqrt(sum(squares of feedrates)) and we’re done :).

Absolute positioning

Since we don’t want to run the machine into the wall because then the steppers would stall and it would stop playing nice music (and also the machine would break, or if you’re lucky enough to have limit switches, turn off), we need to limit the movement to a safe non-Euclidian hypercube (in case we’re using A/B/C rotational axes in addition to the regular X/Y/Z directional axes). Turns out, the non-Euclidian part doesn’t matter and we can treat it as a Euclidian vector space.

So, this part just iterates over the relative moves, turns them into absolute moves (by keeping track of current position), and ensures that we change direction just before one of the axes would hit its limit.

Generate G-Code

The output from the last step is essentially G-Code: absolute positioning commands with a feed rate. That’s G01 in G-Code, e.g.:

G01 X6.9180 Y0.6631 Z0.1068 A0.1459 F55.6962

One more thing though: sometimes in music, there is no music. This is called a “rest”. Fortunately, G-Code also has a concept of “dwell”, i.e. do nothing for a while. It also turns out to be somewhat precise, so rests actually kinda work. Except that there’s more magic in G-Code that I don’t understand (something with command sections, which sound a lot like basic blocks, but who knows) which means that it’ll always dwell a bit longer than instructed. To compensate, all rests are 0.2 seconds shorter than they should, and are omitted if shorter than 0.25 seconds.

And that’s all :). Simple enough.

Why sound quality matters.

As you may have heard, the name of the game in any form of machining is Rigidity ([ridgiddiddi] if you’re American). The middle name of that game, though, is vibration dampening, and while it’s much less assonant, it’s still very important. The problem of vibrations is mainly that of resonance. If your very rigid machine is brought into resonance by the spindle frequency or indeed stepper motor frequency (much less likely, but still), you’ll find that the cuts it makes are going to be very messy, because the tool bit and workpiece are moving with respect to each other. A form of resonance is often called “chatter”.

So, finding the resonance frequencies in your machine is pretty important. How does music help? Well, music goes through lots of different frequencies, and humans have learnt, through many observations, what music ought to sound like when played correctly. If you play music on your CNC machine and it sounds off, that’s probably a resonance that you should fix. Once all the notes sound good on your machine, you can be fairly confident that your cuts will be good as well.

Although of course, it’s mostly just fun :).