
I wanted to do real-time audio visualization and didn't want to fight with music streaming service libraries more than once (I'm looking at you, LibSpotify), so I thought I'd go with the most general solution -- get the audio straight from the OS.
This post is written as a kind of information dump I would have wanted to read when I started figuring this all out.
I had wondered why isn't there any software like virtual audio cable that would also provide programmatic access to what's running through the virtual device. So I took a look at how to write my own, and apparently it's really time consuming and difficult. Not going to start there, then.
Anyway, it turns out in Windows there's something called WASAPI that provides a solution: "loopback recording"
In loopback mode, a client of WASAPI can capture the audio stream that is being played by a rendering endpoint device.
And there's an almost-ready-to-use example for it! Although it was a bit weird goto-heavy let-us-put-almost-everything-in-the-same-function kind of thing.
In the code example in Capturing a Stream, the RecordAudioStream function can be easily modified to configure a loopback-mode capture stream. The required modifications are:
- In the call to the IMMDeviceEnumerator::GetDefaultAudioEndpoint method, change the first parameter (dataFlow) from eCapture to eRender
- In the call to the IAudioClient::Initialize method, change the value of the second parameter (StreamFlags) from 0 to AUDCLNT_STREAMFLAGS_LOOPBACK.
I wasted a lot of time trying to understand what the format of the data I was being delivered by default was, and how to change the format to PCM, but it turns out the beans are spilled right here.
Basically you fill a WAVEFORMATEX struct to describe the format, or modify the struct as it is returned from a call to IAudioClient::GetMixFormat that "retrieves the stream format that the audio engine uses for its internal processing of shared-mode streams."
By the way, often changing to a format that uses the same sample rate (f.ex 44.1khz) and channel count (2 for stereo) can be provided straight away by WASAPI so you don't have to do any actual conversion yourself.
Here's how my system's current configuration's (in hindsight it would be a better idea to just fill the struct...) format could be changed to 16 bit PCM:
pwfx->wBitsPerSample = 16;
pwfx->nBlockAlign = 4;
pwfx->wFormatTag = WAVE_FORMAT_PCM;
pwfx->nAvgBytesPerSec = pwfx->nSamplesPerSec * pwfx->nBlockAlign;
pwfx->cbSize = 0;
IAudioClient::IsFormatSupported can be used to check if the type of audio you'd want to use will work without having to call initialize and seeing if it fails.
One more thing, if you're not familiar with COM code, before calling IAudioClient::Initialize you have to initialize COM, which meant just calling CoInitialize(nullptr)
once somewhere before initializing the audio client.
In the code I wrote to try all this out, I just wrote the captured data to a file which I then imported to Audacity to check for correctness.
Note that the number from IAudioCaptureClient::GetBuffer describing the amount of data we got out of it is in frames. This means to get the byte (or char
) count that ostream::write for example needs we need to do something like this:
int bytesPerSample = m_bitsPerSample / 8;
unsigned int byteCount = numFramesAvailable * bytesPerSample * m_nChannels;
Anyway, here's my example implementation you can check out if you get stuck with something https://github.com/Tsarpf/windows-default-playback-device-to-pcm-file
Hope it's of use to someone.