Audio compression: facts, myths and a blind listening test
Audio compression is part and parcel of day-to-day life. Almost every piece of music you listen to has been compressed. But audio signal processing is hard to understand unless you specialise or are trained in it. That’s why – at least in my experience – most people either don’t bother to understand it or demonise MP3 and anything linked with compression.
I wanted to know: Can you really enjoy music if you only listen to it on Spotify and YouTube? Or don’t we notice the difference between the best possible quality?
Numbers and what they mean
Various parameters give you information about sound quality – but how do you decode it all? Here is an outline of the terms you’re likely to come across:
1. Bit rate
Bit rate expresses the number of bits processed per second. It’s sometimes called a data transfer rate or bandwidth.
The concept is fairly intuitive: the more data, the greater the sound quality. In everyday terms, bit rate is the most important parameter. However, looking at it alone won’t tell you much about sound quality.
Here’s where it gets interesting. There isn’t just one type of bit rate. Instead, they are classed as either variable or constant. These days, most of bit rates are variable (VBR for short). In passages where «not a lot happens», you can compress the file more without audible loss. Complex passages, on the other hand, store a lot of data. What this means is you end up with the same file size but a higher sound quality. Usually, variable bit rates are given as an average and sometimes as the maximum possible value.
2. Compression process
AAC compresses more efficiently than MP3. This means you get better quality than MP3 for an identical bit rate. The same goes for Ogg Vorbis, which Spotify use.
Even encoders, compression software, have an impact on quality. In the early days of MP3, 128 kbit/s tracks often sounded awful. Now they’re much improved, as poor quality encoders are no longer used.
3. Bit depth
Bit depth represents how many bits a sample has. That’s why it’s also known as sampling depth. The more bits there are per sample, the more nuances in volume across the track.
If you’re a photography or video buff, you might have already heard of bit depth. The good news is, bit depth in audio compression has a similar meaning.
A CD has 16 bits per stereo channel. MP3s and other compressed audio files, on the other hand, don’t have a set bit depth. While bit depth doesn’t play much of a part in day-to-day life, it is an important part of studio recordings. In that context, 24 bit is sometimes also used to get more out of recording when it is processed. Afterwards, the music is scaled down to 16 bit as audio experts claim you then can’t hear the difference.
To be honest, it’s Neil Young’s fault that bit depth is being talked about outside of recording studios at all. Young sells a music player called Pono that uses a 24-bit format. Listen to a Neil Young track played in 16 bit und 8 bit (not 24) here. Try it out and see if you can hear the difference. If you think that’s tricky, don’t get me started on the 16 bit v 24 bit comparison.
4. Sample rate
Sample rate (also known as sampling rate) doesn’t come into the equation for the average listener. Where it is essential is in understanding how to save audio digitally. Let me give you an example. A CD has a sample rate of 44100 Hz or 44.1 kHz. Hertz is a unit that more or less gives the rate per second. In terms of audio sampling, that means sound level is measured 44,100 times every second. As I mentioned before: It’s worth working with higher values here, but these won’t be kept in the final format that is for sale.
Nyquist theorem: many people think digital music implies loss and that you’re missing out on the real analogue sound level. But this isn’t unexplored territory – on the contrary. This debate already came up with the advent of CDs. Audio snobs would deride newfangled CDs in favour of good old-fashioned records. But as history has shown, they were eventually proved wrong. The Nyquist sampling theorem says that you can completely reconstruct an audio curve without any loss by using individual points. This assumes the sampling rate is high enough. More specifically, the theorem explains the rate has to be twice as high as the bandwidth. As the limit of human hearing is 20,000 Hz, bandwidth is selected in this range. That’s what makes the sampling rate over 40,000 Hz.
5. Other factors
You can have all the parameters in the world, but they won’t be any use if the audio has been recorded badly. For example, you’ll lose the dynamic if your sound technician doesn’t set the sound level high enough. When you listen back to the recording and turn up the volume, you’re met with noise interference. But turn the sound level up too high and the result is even worse. Your recording is distorted, fragmented and scratchy. Or the dynamic compressor could make the result unrecognisable. Bad recordings are all over YouTube and common on CDs. They’re the result of very old studio recordings and live extracts from concerts.
As we’re on the topic of sound quality, your headphones and speakers also play a part. With poor quality mini speakers, for instance, you’ll barely be able to tell the difference between MP3 in 128 kbit/s and uncompressed music. That’s something you’d be able to distinguish on good speakers.
And now it’s time to put our ideas to the test
As part of this article, I got ten members of the digitec team to take part in a blind listening test. I made sure the group was an equal mix of those who didn’t work with audio quality and weren’t too fussy about it and others who thought it was essential.
Here’s what the group stats looked like: two women and eight men took part in the test. The age range for seven of the ten participants was 25 to 30, with the oldest in the group being 40. You can’t accuse my guinea pigs of going deaf with old age. All in all, they were a good mix of your average listeners, audio experts and people who make their own music. Most of them used my Sennheiser HD 449 as headphones, one person wanted to use their own and another did the test with a pair of our Logitech office headsets.
They all listened to three music extracts from different genres (classical, jazz, pop/rock). The clips were about 30 to 45 seconds long. For each one, I scaled the .wav file down to CD quality (1411 kbit/s, PCM 16 bit) via a number of compression stages using LAME and the AAC Apple encoder:
- MP3 V9 (lowest quality, roughly 65 kbit/s VBR): this is really quite bad and is rarely used.
- MP3 V5 (medium quality, roughly 130 kbit/s VBR): you’ll still come across this in the world of streaming but it’s a thing of the past in downloads.
- MP3 V0 (highest quality, roughly 245 kbit/s VBR): this is the quality you get in the Amazon shop.
AAC 256 kbit/s: given how much more efficient the AAC process is, it should give better results than the best MP3. This is the quality you find in the iTunes Store.
Then I converted the files back into WAV/PCM so you couldn’t tell which files were which by looking at them. In other words, all the files were the same size.
Try it out for yourself: you can even take the listening test at home. All you need to do is download these files from the digitec website. To make sure it stays a blind test, you need to extract the zip file before opening it. Only when you unzip the files do they appear the same size.
Results: interpreting the blind test
I originally intended to introduce each person and write up their individual results. But as I saw how the experiment was progressing, I realised that would just send you to sleep. The reason being the results were the same, regardless of whether the listener was an occasional listener or an audio geek.
All of the guinea pigs were quick to identify the worst quality (MP3 with VBR 65 kbit/s). Only two members of the group didn’t have the same success with the classical music. However, answers varied wildly for the other four levels. The participants unanimously admitted to being unsure or having to guess. They got the answer right about 20% of the time.
Strangely enough, the second worst file wasn’t identified as often as the best three. The alleged experts were no better than the occasional listeners. Now that’s something I didn’t expect. Everything I’d read said this level was easy to set apart from the others. Most of the time, the guinea pigs couldn’t distinguish average (variable) bit rates starting at 192 kbit/s from the original (Source). The second worst level was much lower at 130 kbit/s VBR.
The participants didn’t get a heads-up about what they would get to listen to each time. If they had, the results might have been better. But I was more interested in recreating an everyday scenario. I mean, we listen to music in our spare time because we love it and not because we enjoy identifying compression faults.
You could, of course, complain that I didn’t carry out this test in the most optimal conditions. With top-of-the-range headphones, a special playback device and the backdrop of a silent office, you might be able to interpret more. However, as I explained, my aim was a realistic environment. It goes without saying that if you swear by uncompressed music, it’s not enough to just play the best quality files. You need to have the right gear as well. That means buying the high-end version of everything, from headphones to speakers, amplifiers, playback devices and even cables.
YouTube – the special case
Most YouTube videos use AAC with 128 kbit/s. In theory, the quality should be good enough. After all, our blind test participants couldn’t even tell which was an MP3 track when it had the same bit rate – and AAC is much better. That being said, I can hear a difference between music I play on my record player and music from YouTube videos. That probably has something to do with the fact that these audio files are converted a number of times. When you move sound to the video editor and export the video, the clip is already being compressed. When you upload it to YouTube, you’re converted the file again.
I tried to put that to the test by making a video with WAV audio and medium MP3. When it came to exporting, I made sure there was no audio compression. I used the same files as in the blind listening test, which you can download. I didn’t notice any distinct difference, but listen for yourself and let me know what you think.
The compression process is steadily improving. The variable bit rates, better codecs and optimised encoders available these days deliver quality so good it’s very difficult – if not impossible – to tell compressed files and CD tracks apart.
Buying music from Amazon or Apple means you’re always on the safe side. You won’t have sound quality issues and there's even room for improvement. The same applies when you stream Spotify and select the highest quality. Find out more here. As I mentioned, Codec Ogg Vorbis is better than MP3, which is why 96 kbit/s is still acceptable.
The problem arises when you have compressed material that is compressed again. But even in this instance, my YouTube test made me doubt whether it’s as bad as I thought. It’s certainly not a good idea to convert your music collection from MP3 into AAC just because this is the better codec. You should even have an uncompressed copy of music you recorded yourself. You’d then export it to MP3 or AAC to listen to it. As for Bluetooth, it compresses music that is already compressed. However, Bluetooth technology has come on in leaps and bounds in recent years. In fact, if you’re using a high-quality, uncompressed file and a high-end Bluetooth codec, you shouldn’t be able to detect any flaws.
After this test, I’m feeling a lot more relaxed about the whole issue of audio compression – except where double compression is concerned. I realise now there’s no point in me worrying about the fancy-pants FLAC files that won’t play on my smartphone. Maybe the Spotify generation with their carefree approach to the topic (if they even think about it at all) aren’t too far off the mark. MP3 in today’s standard quality format is good enough and plays everywhere – and as of April 2017, MP3 was even licence-free.