Reverse-engineering Yamaha's proprietary MIDI networking protocol
At my church, I am privileged to use some very expensive professional technology. One such example is my ever-favourite, the Yamaha LS9-32 digital audio mixing console. It is designed and engineered brilliantly. However, it predates the smartphone!
Pre-smartphone technology
Stuff this old is always a bit special. It has its own software and its own way of doing things. For Yamaha in this era, that was via the Studio Manager ecosystem. Studio Manager was a really cool bit of software that allowed you to load (as a plugin) each one of their consoles, so if you were fortunate enough to have a whole studio of their gear, it could all be synchronised and work together as one big session.
Studio Manager, nor the plugins, actually handled the connection to the hardware though. That was provided by MIDI drivers, or USB-MIDI drivers, for some of their older and more entry-level consoles that lacked the fancy stuff of their really high-end consoles. The LS9 was just professional enough to have a proper Fast Ethernet port, and thus supported their super-cool and fully-proprietary custom MIDI-over-TCP protocol.
In this post, I'll detail what I know about reverse-engineering it for my own purposes.
MIDI-over-TCP
These days, there are several solutions for transporting MIDI 1.0 over a network connection, the most ubiquitous and well-supported of which would be RTP-MIDI. Alas, the LS9 (and other similar Yamaha consoles) predate RTP-MIDI by a year or two. Thus, Yamaha had to engineer their own solution, which I will be referring to as YET-MIDI (Yamaha Ethernet-Tunnelled Musical Instrument Digital Interface).
Before we get too far into the weeds of what it does and how it works, why am I bothering about this in the first place?
Motivation
To be honest, I haven't used the LS9 in many years. It was on its last legs in 2019 when we finally replaced it (after nearly 15 years of 24/7 operation!) with a Yamaha QL5. Whilst they are rather similar, the QL5 was released in the age of the smartphone, and as such, benefits from having a companion app that the musicians can use on-stage to adjust their monitoring mix, called MonitorMix.
It's a very handy app, from two aspects:
- It takes a load off the solo sound engineer, as the band collectively becomes their own monitoring engineer, leaving the sound engineer to focus on front of house
- It gives ultimate control to each band member to wilfully adjust and perfect their own mix to their taste
Hence, I wanted to essentially "backport" this functionality to the LS9.
The plan
How could I pull this off, if the LS9 doesn't support it? Well, it does support networked remote-control via the Studio Manager software with the LS9 plugin and Yamaha's Network-MIDI driver.
The problem
The aforementioned driver was made for Windows XP. It doesn't cope well with being installed and uninstalled on modern devices, and it's ugly to have a dependency on external software and libraries for what ultimately is just a network connection to a piece of hardware.
The solution
Thus, I set out to replicate the functionality of this driver with my own application, which would connect to the console just as Studio Manager would've done, and would also provide an API that smartphones (running my own companion app) could connect to and thus replicate the desired experience.
The challenge
I didn't know how to reverse-engineer anything. I still don't really know how. Let alone complicated non-standard Windows device drivers! Should I look into the network drivers section, or the MIDI drivers section, or the Control Panel applet, or...
The journey
Before we get to the technical details and findings, I want to share a little bit of the journey that I went on to figure this out.
First, I found the MIDI parameter change list from Yamaha's website, and pored over it to find all the details that I needed. It looked like it supported everything that I needed, and so I acquired a USB to MIDI adapter, to interface with the console. At first it went really well, and I could start to understand how MIDI System-Exclusive (SysEx) messages are formatted and sent, as well as how Yamaha had formatted their own protocol for communicating binary data using them.
In time, I found that it didn't quite handle everything that I wanted it to, such as which channel is currently selected. I knew it could it do this, because it worked in Studio Manager... using the Network-MIDI driver.
For the reasons mentioned earlier, I didn't want to actually use this driver, so I booted up Wireshark and started sniffing packets. This worked for a while, and I was eventually able to get something open and working... for a few seconds.
The console would always terminate the connection after 3 to 5 seconds or so. I couldn't figure out why. It worked perfectly up until that point. And then just stopped. The console wouldn't even send a message to say, "chat over, I'm out." It would close the connection at the TCP level!
At that point, I knew there was something I was missing. Some key piece of information, some essential detail. I searched high and low, everywhere I could, across the entire web. I found one other person who was doing similar things and seemed to give up pretty quickly, and another person who seemed to be doing almost exactly what I was doing. They never figured it out either though.
There was one other thing that worked... someone had managed to reverse-engineer it enough to sell a fancy companion app that did everything I wanted (and more) for many of Yamaha's older consoles. Though, as a commercial project, I didn't think they'd be sympathetic to me asking for their insight or source code for free.
All-in-all, I was on my own, and I had to learn to do it all myself. I downloaded the industry-standard free software reverse-engineering tool, Ghidra, and set about doing my best to figure out what the Network-MIDI driver was doing.
I spent a year on-and-off trying, and failing to make any notable progress. Or any progress. I literally got nowhere.
Eventually, I shelved the project, and shortly thereafter, our Yamaha LS9-32 finally kicked the bucket (ironically, whilst sitting unused in a cupboard for a year).
Hope
There was always this other project, called Companion which had forum posts alluding to an ability to connect to Yamaha consoles over a network connection. However, newer Yamaha consoles use a different protocol, called SCP or RCP, which is very similar to the OSC protocol.
Late one night, I got an urge to investigate it one more time. The Yamaha control plugin currently in their software only uses SCP/RCP, though I managed to find (and download!) the older plugin that supported MIDI communication... over a network connection!
However, it was not the source code. Nevertheless, I was determined to reverse-engineer it if need be. And as it turned out, their plugins are written in JavaScript and executed by Node.js, which I happen to be rather familiar with, thanks to my day job.
Very fortunately, they hadn't bothered to obfuscate any of the JavaScript, and had only minified it, so I could copy it out into an online formatter and tidy it up and put it back in my IDE of choice and have a good scroll through...
As it was not obfuscated, all the variable names were untouched, and even some comments stuck around! Almost immediately, I saw a comment that sparked enough inspiration. It made mention of a "heartbeat". I think I spent a good 30 seconds just facepalming at how simple and obvious it was. I went back through my notes that I'd taken whilst sniffing packets in Wireshark, and I saw that I had even marked off one regularly-repeated packet as maybe being a heartbeat...
With that last bit of info, I think I can successfully explain enough about the protocol to get basic functionality working.
Please note that this only applies to the Yamaha LS9 series of consoles!
YET-MIDI
Communication is made via TCP, using port 12300
. It is a bit strange in that you are required to do things at the application-level that are also handled at the TCP-level.
Start a session by opening port 12300
to the console (from any dynamic port). Be sure to be listening and accepting TCP connections on port 12300
, as the console will open a reciprocal connection to you. Use the connection that your application opened to the console to transmit, and the connection the console opened will be used to recieve data from it.
When receiving data, be sure to echo (send) it back to the console, otherwise the connection will be closed immediately!. Likewise when transmitting data, the console will echo back what you sent down the TCP connection you opened.
The last and most important part is: send a heartbeat message every second or so. The console will abruptly close the connection if it misses a few in a row.
Initialisation
We start by sending some synchronisation and startup headers. I have no idea what they mean, I just copied them from the packet traces I captured of Studio Manager, and it seems to work:
-> 00000010200000000100000020000003
-> 000000102300000019e7000000000000
These occur over the first TCP connection to the console, as sent by Studio Manager (or your application). It was also noted that this message was sent by Studio Manager every second or so:
-> 000000104000000000000000ffffffff
The console will then open a TCP connection to your application, where it expects the following data:
-> 000000102100000001007777a0000003
-> 00000010240000000000000000000000
Once both connections are open, we can start sending and receiving MIDI!
<- 0000001b1600000000000007ffffffff00000007f043103e127ff7
The MIDI for the heartbeat message (to be sent by your application every second or so) is:
F0 43 10 3E 12 7F F7
The protocol seems to be described by this format:
"%08X16000000%08XFFFFFFFF%08X%s
- The first number is the MIDI message length, in bytes, plus 40, divided by 2
- The second number is the MIDI message length, in bytes, divided by 2
- The third number is the same as the second
It seems plausible that the protocol is operating on "words" (16-bits) at a time, rather than bytes (8-bits).
SysEx
Yamaha exclusively (hah) uses MIDI System Exclusive messages, as the general MIDI commands (note on, note off) are largely irrelevant and far too limited for what needs to be actually transmitted. Even the Parameter Change and Control Change tables are far too small to handle the scope of complexity in a digital audio mixing console. Yamaha does support Control Change, Parameter Change, and even NRPN (Non-Registered Parameter Number) messages over their regular MIDI 1.0 ports.
MIDI 1.0 System Exclusive messages are signified by the F0
byte, followed by a system-defined number of bytes, and concluded by an F7
byte. Yamaha adds a 4-byte header to identify themselves, and the console, and also to specify which ID (and MIDI channel, if using the MIDI 1.0 ports) the console is currently set to. This will be ID 1 (or 0, as programmers do 0-indexing), and MIDI channel 1 by default.
After the Yamaha-specific header, you can refer to the Parameter Change List spreadsheet, available from Yamaha's website, on how to interpret the rest of the bytes in the message.
MIDI, ASCII, and 7b/8b encoding
But wait, there's more! Well, actually, one "gotcha" to figure out.
MIDI 1.0 is inherently a 7-bit protocol. But networking and modern computers (since the 1970's) have been built around the idea that there are 8 bits in a byte, not 7 or any other arbitrary number. This makes it somewhat annoying to deal with, as modern systems and programming languages make dealing with bytes... passable, if not enjoyable. Dealing with individual bits just ain't it anymore. As you are hopefully aware, MIDI is a 7-bit protocol. However, the console is modern, and thus communicates 8-bit bytes of data. Alas, even when we are using a modern TCP connection, this 7-bit limitation applies. It gets even more confusing when text is transferred, as it is encoded using ASCII, which is itself a 7-bit specification!
All 7-bit ASCII text is zero-extended to 8-bit bytes by the console already, as well as the rest of the data it sends. However, MIDI doesn't support 8-bit bytes, so the console then splits four 8-bit bytes of data over four and a half 7-bit bytes of data, which are then interpreted by modern Operating Systems as 8-bit bytes again!
What does all this actually mean? Well, it's like this:
...
0000 xxxx
0xxx xxxx
0xxx xxxx
0xxx xxxx
0xxx xxxx
1111 0111 (F7)
The 0
's are extras inserted by the various Operating Systems in order to transport over a network connection and suchlike, whilst the x
's are the actual data that we care about. And how does ASCII fit into this?
...
0000 xxxx
0xnn nnnn
0xnn nnnn
0xnn nnnn
0xnn nnnn
1111 0111 (F7)
As before, the 0
's are irrelevent and can be discarded, but now the x
's are also always 0, as they have been extended from the 7-bit ASCII standard, leaving the n
's as the actual text data that we want! And it is in this manner of complication and innefficiency that we manage to spend nearly 5 bytes of 8-bit data to transport three 7-bit ASCII characters.
Trademarks
"Yamaha" is a registered trademark of Yamaha Corporation.