AIFF versus WAV: Which audio file format is best?
Question: WAV and AIFF File format: what’s the difference?
Which is best for audio? Mp3 and Mp4 as well as FLAC and other less common formats make a small portion of the audio files out there as well.. but What’s best WAV or AIFF?
Answer: AIFF and WAV are the exact same quality
WAV is mostly used for PCs and AIFF being used mostly for MAC, but either can be played on virtually all types of computers and devices. Both use the same type of encoding that results in a relatively large file size, but a maintaining a higher quality sound than mp3 m4a or other smaller files.
So when should you use WAV / AIFF and when should you use MP3 / M4A etc?
When you are concerned with speed and small file size, use mp3. For example, on the internet, or in an email. No one wants to download a 50mB file to listen to song on their phone, or send a quick tune to a friend. If you’re streaming on a website, or youtube, that’s likely streaming a lossy file format similar to mp3. In fact, mp3 is just a video file, without the video part.
How Downsampling Affects Audio Quality
For illustrative purposes, imagine the top image is your full quality audio file, and the bottom is your downsampled mp3. The bottom image lacks the perceived clarity and depth of field of the image. This is a similar visual concept to how mp3s and other lossy formats are able to get a pretty close to the original file, but lacks the data to fully represent the full waveform.
This image shows how a downsampled image can be lower quality than the original to save space. Generally, it looks about the same, but closer inspection shows loss of detail. Just like a pixelated JPEG, audio files that have too much lossy compression can be lackluster, and sound like they lack the detail of a WAV or AIFF file. The top image would represent a WAV or AIFF, and the bottom image would represent a MP3 or Mp4, although the image is a not an actual representation, but for illustrating how compression affects quality.
What is a WAV file and how do I play it?
Almost all phones and computers (including Macs and Windows users) can play a WAV file. Generally, AIFF can be played on an Apple product like iphone Mac OS based computer, but almost any media player like VLC or iTunes will play both. Originally developed by IBM and Microsoft, Wave files are a raw audio format from before people had the internet. While it is very old, the format is very basic, and is essentially a mathematical function for describing a sound wave.
Both WAV and AIFF use the same encoding method!
Both AIFF and WAV are based on the same IFF* (Interchange File Format) AVI, ANI, and WAV all use RIFF(Resource Interchange File Format) a flavor of IFF*, which is based on pieces of data which are referred to as chunks. There is the the main data chunk, as well as the name chunk, artist chunk, copyright chunk, etc, where additional data can be added for those categories. In addition, WAV and AIFF files can have multiple channels, such from just one mono channel, to two stereo channels, 5.1 (six channels), 8 channels or more.
*David noted “…, AIFF is not derived from RIFF. Both AIFF and RIFF were derived from IFF, released by Electronic Arts in 1985. Also, AIFF preceded RIFF by 3 years. It cannot be based on RIFF. They are very similar.”
*Thanks for the correction David! So, AIFF is derived from IFF, WAV uses RIFF which is also dirrived from IFF ( Thanks for the clarification!)
WAV and AIFF Encoding Explained Further
Both AIFF and WAV are lossless file formats, in other words, there is no loss of data. The file format differs slightly, but the digital information is stored as an exact mathematical representation of the waveform. WAV and AIFF both use PCM (Pulse Code Modulation) to encode the data in a manner to minimize loss of quality. WAV or AIFF are both CD quality or “studio quality”, with CD being 16 bit and usually “studio quality” being 24 bit or higher. The general rule of thumb is if you record at 16 bit, render the files at 16 bit. If you record at 24 bit, render your mixes at 24 bit. Files can always be downsampled to a lower bit rate. For example 24 bit can easily be downsampled to 16, but once you are at 16, going back to 24 bit is pretty useless.
Since WAV or AIFF files are Lossless, they can take up a LOT of S P A C E !
Both WAV and AIFF can be encoded with timestamps, tempo information, and other types of information like markers. Pro Tools or Logic can create WAV or AIFFs. According to internet “sources” the difference is the byte order. With AIFF being optimized for motorola processor, and WAV files optimized for Intel based microprocessors, but really there is no difference in performance.
The exciting world of Pulse Code Modulation
Pulse code modulation is a mathematical way to digitally represent analog signals. It is used in digital audio devices. The amplitude (otherwise known as the energy level or loudness of a sound) is measured at different points. The amount of times the amplitude is measured a second is called the sample rate. For example, 44.1k sample rate, means that 44,100 samples per second are captured. For 96k, 96,000 times a second the sound is measured.
Bit rate is the amount of steps on the ‘measuring stick’ that measures the amplitude. 16 bit and 24 bit are the most common, but the general idea is that a higher bit rate is more precise. 24 bits has a higher dynamic range than 16 bit , or more precise measurement because it has 24 units of data to measure, versus only 16 bits, which has 6 less steps on the scale.
While this may be a slight oversimplification, you get the idea. More Bits is better. A higher sample rate is more exact.
Some common sample rates would be:
- 44.1 16 Bit (CD Quality)
- 48k 16bit (DVD-Video Quality)
- 96k 24 bit DVD-Audio Quality (DVD-A)
In the studio 48k 24bit or 96k 24bit are often used as “studio quality” and then downsampled later. Most people do not want 96k WAV files. It’s just too big, and who listens to “better than CD quality” these days anyways. Maybe audiophiles and studio people, but many listen to mp3 or lossy versions too. You need your songs to sound great in all formats, and getting caught up in all the different versions can be a rabbit hole you don’t want to go down. Plenty of great material has been recorded at 44.1k, plenty of terrible material has been recorded at 192k (or even higher!) but it doesn’t make it sound any better it it’s no good from the start.
Quality Analysis via Nyqust- Shannon Sampling Theorem
Nyqust Theorem or Shannon Sampling theorem is a mathematical formula for determining what the theoretic maximum frequency you can reproduce using different sample rates. The nyquist theorem states that frequencies below half the sampling rate in kilohertz can be reconstructed. The range of human hearing can be estimated to be 20hz-20khz. So using that formula, 44.1k should be able to reproduce well up to the 20k limit (that being said, 20k is barely perceptible, and with 2.5k to 5k still registering in the “higher pitch” areas, 10k and 12k being piercingly high, 20k is not that useful of a frequency when it comes to mixing audio).
Debate and Conclusion
In my experience, bit rate (16 bit vs 24 bit) oftentimes make more difference to the audio than the sample rate (e.g. 44.1k vs 48k)
For some material 96k may or 192k may sound a bit better, but the enormous file size is not worth it. Material will eventually get downsampled to mp3, and there are other things like final gain staging that matter more. Check your file output gain with a meter and make sure you aren’t clipping, and leaving a bit of headroom for the converters and downsampling.
So why use 96k at all? By ensuring frequencies up to 40k are covered, we have a very accurate version of the sound that can be use to mix down to 44.1k or something more reasonable. Most plugins can use 96k, and most listening tests DO show quality difference between 44.1k and 96k. 192k can be used or even higher, but that could be debated as far as perceptible quality vs file size. Try it out for yourself.
There are literally hundreds of factors that go into the sound of the recording. Microphone choice and placement, pre-amps, converter quality and anti-aliasing, clocking, as well as physical environment all play a role making much greater changes to the sound than 48k vs 96k or 44.1 vs 48k. So don’t sweat it too much.