Hi! Hope you're enjoying this blog. I have a new home at www.goldsborough.me. Be sure to also check by there for new posts <3

Friday, March 14, 2014

Developing a digital synthesizer in C++: Part 2 - Wavefile output

After reading the last part in my series, in which I showed you how to program a basic sine wave and store it in a buffer, you may be wondering how you can get from this array of integers inside your code to ... actually hearing something!

We have two options here:
  1. Sending the data buffer to your sound card for it to output it directly. Or:
  2. Storing and encoding the data in a certain file format.
While the first option may appeal more as you get faster results, it is also a lot more complex. In this post I will show you how to do the 2nd, namely storing the data in a file. In our case, as the title indicates, we will be dealing with the WAV format.

The WAV format

 The Waveform Audio File Format is a lossless, uncompressed audio file format. It is part of the Resource Interchange File Format (RIFF)  file format family and thus stores it's information in a certain number of chunks. It was originally considered a Windows-only file format, though nowadays this is not true anymore and it is one of the most popular file formats out there.

From a listener's standpoint, this is due to the fact that the audio data stored in the format is uncompressed, meaning every single sample we calculated will be stored in exactly that form, unchanged for virtually all eternity. This results in very good audio quality no matter how often you re-process the samples (given you do not change them). The downside to this increased audio-quality is naturally that also the file size is usually quite a bit higher than that of compressed formats such as the MPEG-3 (mp3) files.

Now from the programmer's viewpoint, the popularity of the WAV file format stems from the fact that it is insanely simple. A .wav file is basically made of three "chunks" of data:

  1. RIFF chunk: 
    1. ID chunk: a size-4 char array holding the word "RIFF" to identify the file as part of the RIFF family.
    2. Chunk size: the size of the entire file in bytes (not just the data, also the "RIFF" char array for example).
    3. File format chunk: another size-4 char array that identifies the specific file format of the RIFF Family, in our case this will hold the string "WAVE". 
  2. FMT (format) chunk:
    1. ID chunk again: size-4 char array "FMT " (notice the trailing space)
    2. FMT chunk size (the size of this chunk)
    3. - 6, other meta-information about the data buffer like samplerate, number of channels, byte rate and a few other things.
  3. DATA chunk:
    1. "DATA" char array
    2. Chunk size
    3. The raw data buffer

I'm not kidding you, this is not some XML format I made up for my latest side-project, but one of the most widely-used and most popular file formats on this planet. If you want to read more on the exact specifications, have a look at this website.

The code

All of these chunks can be implemented in code as the following struct:

    struct waveheader
        uint8_t riff_id[4]; // 'R' 'I' 'F' 'F'
        uint32_t riff_size; // chunks size in bytes
        uint8_t wavetype[4]; // 'W' 'A' 'V' 'E'
        uint8_t fmt_id[4]; // 'f' 'm' 't' ''
        uint32_t fmt_size;
        uint16_t fmt_code; // 1 = pulse code modulation
        uint16_t channels;
        uint32_t samplerate;
        uint32_t byterate; // bytes per second
        uint16_t align; // bytes per sample * channel
        uint16_t bits; // one byte per channel, so 16 bits per sample
        uint8_t wave_id[4]; // 'd' 'a' 't' 'a'
        uint32_t wave_size; // byte total

Our task is it now to insert our data into the members of this struct and write it to a file along with the data buffer we created here. Below is a function that does exactly that (the struct I shared above is declared outside the function). Note that we have to write each sample to the file twice in a row, once for each channel (have a look at the exact specification from the link I shared above to see why).

void writeWav(std::string fname,int16_t * buff, uint32_t dur, uint16_t sr, uint8_t ch)
    waveheader wh;
    // copy the string "RIFF" into the riff_id char array
    memcpy(wh.riff_id, "RIFF", 4*sizeof(uint8_t));
    // same for the wavetype char array
    memcpy(wh.wavetype, "WAVE", 4*sizeof(uint8_t));
    // take notice of the trailing whitespace in the string
 memcpy(wh.fmt_id, "fmt ", 4*sizeof(uint8_t));
    // fmt_size is 16, the size (in bytes) of the rest of the subchunk
 wh.fmt_size = 16;
    // fmt_code is the audio format used for storing the data. This
    // should usually be 1, which is used for Pulse Code Modulation (PCM).
    // PCM is basically how the file connects the samples we created to 
    // make an actual waveform (curve) out of it.
 wh.fmt_code = 1;
 wh.channels = (uint16_t) ch;    // 1 = mono, 2 = stereo
 wh.samplerate = sr;
    // The size of the individual samples. Since we are using
    // 16bit signed integers to represent the sample values,
    // we will use 16 here.
    wh.bits = 16;
    // Align is the total number of bytes for all channels
 wh.align = (wh.channels * wh.bits) / 8;
    // How many bytes per second, there are samplerate
    // (44100) samples per second, each being of size
    // align ((ch * bits )/ 8)
    wh.byterate = (wh.samplerate * wh.align);
    // copy the "data" string to the chunk id array
 memcpy(wh.wave_id, "data", 4*sizeof(uint8_t));
    uint32_t total_samples = sr * dur;
    // number of data bytes in the file, the two
    // stands for the 2 bytes per sample sent per
    // channel
    uint32_t byte_total = total_samples * 2 * ch;
    // now that we have ALL the info, we can calculate
    // the total size
    wh.riff_size = byte_total + sizeof(wh) - 8;
    // assigning byte_total
    wh.wave_size = byte_total;
    // Samples are stored channel for channel, so
    // a single sample has to be written twice in a
    // row, once for each channel.
    total_samples *= 2;
    int16_t * end_buffer = new int16_t[total_samples];
    for (uint32_t n = 0; n < total_samples; n += 2)
        int16_t val = buff[n/2];
        end_buffer[n] = val;
        end_buffer[n + 1] = val;
    std::ofstream f;
    f.open(fname, std::ios::binary | std::ios::trunc);

    if (! f.write(reinterpret_cast(&wh), sizeof(wh)) ||
        ! f.write(reinterpret_cast(end_buffer), byte_total))
        throw std::runtime_error("Thy digital oscillation datum did not write properly!");
    delete [] end_buffer;

The code together with the comments should be pretty self-explanatory and as far as I'm concerned, this code works perfectly. I hope it does so for you too, but please feel free to comment if you are having problems or have any questions about this topic. I am currently working on creating ADSR envelopes as well as directing the sound to the sound card for direct playback and will post either of these steps in the following part of this series.

No comments :

Post a Comment