WAV or AIFF vs MP3 or M4A: Which audio file format is best?

Quick answer… which is best to use to store audio data? WAV AIFF MP3 or M4A?

Answer: WAV is usually best quality and most common for studio recordings and audio samples. MP3 is suitable for emails, but m4a is the worst and should be avoided. AIFF is good too. its the same as WAV see below.

What are recommended settings for WAV audio?

WAV and AIFF both use Bits per Second and Sample rates in Khz

24 Bit 48k WAV or AIFF is is a great quality audio files for most purposes.

But 44.1k or 32 bit as well as 16 bit files can sometimes be fine too!

16 bit can be fine for bounce quality, and 32 bit can be useful for high dynamic range recording. Higher sampling rates means more sampling, and more precise intervals, but in general 24 bit is a good sampling quality for studio or home recording. 192k files are just too large to deal with and the oversampled audio doesn’t sound any different in a world where many people will listen on a cell phone or smaller device streaming over WiFi. sometimes we use it, but generally its a pain to deal with that large of file sizes for minimal quality improvement, and we actually see an inverse correlation between file size and quality of music.

A More Detailed Explanation:

Question: WAV and AIFF File format: what’s the audio quality difference?

Which is best for audio? Mp3 and Mp4 as well as FLAC and other less common formats make a small portion of the audio files out there as well.. but What’s best WAV or AIFF?

Answer: AIFF and WAV are the exact same quality

WAV is mostly used for PCs and AIFF being used mostly for MAC, but either can be played on virtually all types of computers and devices. Both use the same type of encoding that results in a relatively large file size, but a maintaining a higher quality sound than mp3 m4a or other smaller files.

So when should you use WAV / AIFF and when should you use MP3 / M4A etc?

When you are concerned with speed and small file size, use mp3. For example, on the internet, or in an email. No one wants to download a 50mB file to listen to song on their phone, or send a quick tune to a friend. If you’re streaming on a website, or youtube, that’s likely streaming a lossy file format similar to mp3. In fact, mp3 is just a video file, without the video part.

How using and MP3 or M4A affects your Audio Quality: Why Downsampling Hurts Audio Quality

For illustrative purposes, imagine the top image is your full quality WAV or AIFF audio file, and the bottom is your downsampled mp3 or m4a. The bottom image lacks the perceived clarity and depth of field of the image. This is a similar concept to how mp3s and other lossy formats are able to get a pretty close to the original file, but lacks the data to fully represent the full waveform.

Image before Downsampling Data

Image after Downsampling Data (Using a Lossy Data Compression)

This image illustrates how a downsampled image can be lower quality than the original to save space. Generally, it looks about the same, but closer inspection shows loss of detail. Just like a pixelated JPEG, audio files that have too much lossy compression can be lackluster, and sound like they lack the detail of a full quality WAV or AIFF file.

The top image would represent a WAV or AIFF, and the bottom image would represent a MP3 or M4A. The image is a not an actual representation, but for just for how imagining how lossy mp3 or m4a compression affects quality.

Uhhh. So How do play a WAV file?

Almost all phones and computers (including Macs and Windows users) can play a WAV file. Generally, AIFF can be played on an Apple product like iPhone Mac OS based computer, but almost any media player like iTunes will play both. Originally developed by IBM and Microsoft, Both are a raw full quality audio file formats from before people even had the internet. While it is very old, the format is robust and high quality, and is essentially a mathematical function for describing a sound wave. Audio is sampled at a specific sample rate. 44.1k is the rate that the audio is sampled on a compact disc optical CD.

Both WAV and AIFF use the same encoding method!

WAV uses RIFF (Resource Interchange File format) AIFF stands for Apple Interchange File Format because it is derived from IFF (Interchange file Format)

While WAV was developed by IBM and Microsoft, AIFF was developed by Apple. They are both uncompressed linear formats.

Both AIFF and WAV are based on the same IFF* (Interchange File Format) AVI, ANI, and WAV all use RIFF(Resource Interchange File Format) a flavor of IFF*, which is based on pieces of data which are referred to as chunks. There is the the main data chunk, as well as the name chunk, artist chunk, copyright chunk, etc, where additional data can be added for those categories. In addition, WAV and AIFF files can have multiple channels, such from just one mono channel, to two stereo channels, 5.1 (six channels), 8 channels or more.

*David noted “…, AIFF is not derived from RIFF. Both AIFF and RIFF were derived from IFF, released by Electronic Arts in 1985. Also, AIFF preceded RIFF by 3 years. It cannot be based on RIFF. They are very similar.”
*Thanks for the correction David! So, AIFF is derived from IFF, WAV uses RIFF which is also derived from IFF (Thanks for the clarification!)

WAV and AIFF Audio Data Encoding Explained Further

Both AIFF and WAV are lossless file formats, in other words, there is no loss of data. The file format differs slightly, but the digital information is stored as an exact mathematical representation of the waveform at a certain sampling rate. WAV and AIFF both use PCM (Pulse Code Modulation) to encode the data in a manner to minimize loss of quality. WAV or AIFF are both CD quality or “studio quality”, with CD being 16 bit and usually “studio quality” being 24 bit or higher. The general rule of thumb is if you record at 16 bit, render the files at 16 bit. If you record at 24 bit, render your mixes at 24 bit. Files can always be downsampled to a lower bit rate. For example 24 bit can easily be downsampled to 16 bit, but once you are at 16 bit, going back to 24 bit is pretty useless.

Since WAV or AIFF files are Lossless, they can take up a LOT of S P A C E !

Both WAV and AIFF can be encoded with audio data like timestamps, tempo information, and other types of information like markers. DAWs like Pro Tools, Logic, or Studio One can create WAV or AIFFs. According to internet “sources” the difference is the byte order. With AIFF being optimized for a Motorola processor, and WAV files optimized for Intel based microprocessors, but really there is no difference in performance. So 192k or 96k might just be taking up a ton of space on your computer. Large album projects can easily get to 100gb or more even at 44.1k so ask yourself if a 430gb or larger file size going to be manageable for you. It might just be extra information clogging up the works when there are other ways to improve the sound than just using a very high sample rate.

The exciting world of Pulse Code Modulation

Pulse code modulation is a mathematical way to digitally represent analog signals. It is used in digital audio devices. The amplitude (otherwise known as the energy level or loudness of a sound) is measured at different points. The amount of times the amplitude is measured a second is called the sample rate. For example, 44.1k sample rate, means that 44,100 samples per second are captured. For 96k, 96,000 times a second the sound is measured.

Bit rate is the amount of steps on the ‘measuring stick’ that measures the amplitude. 16 bit and 24 bit are the most common, but the general idea is that a higher bit rate is more precise. 24 bits has a higher dynamic range than 16 bit , or more precise measurement because it has 24 units of data to measure, versus only 16 bits, which has 6 less steps on the scale.

While this may be a slight oversimplification, you get the idea. More Bits is better. A higher sample rate is a more exact measurement.

Some common Sample Rates for High Quality Audio Files would be:

44.1 16 Bit (CD Quality)
48k 16bit (DVD-Video Quality)
96k 24 bit DVD-Audio Quality (DVD-A)

In the studio 48k 24bit or 96k 24bit are often used as “studio quality” and then downsampled later. Most people do not want 96k WAV files. It’s just too big, and who listens to “better than CD quality” these days anyways. Maybe audiophiles and studio people, but many listen to mp3 or lossy versions too. You need your songs to sound great in all formats, and getting caught up in all the different versions can be a rabbit hole you don’t want to go down. Plenty of great material has been recorded at 44.1k, plenty of terrible material has been recorded at 192k (or even higher!) but it doesn’t make it sound any better if it’s no good from the start. There is a lot more to sound quality than just sample rate, so really don’t sweat it.

Quality Analysis via Nyqust- Shannon Sampling Theorem

Nyqust Theorem or Shannon Sampling theorem is a mathematical formula for determining what the theoretic maximum frequency you can reproduce using different sample rates. The nyquist theorem states that frequencies below half the sampling rate in kilohertz can be reconstructed. The range of human hearing can be estimated to be 20hz-20khz. So using that formula, 44.1k should be able to reproduce well up to the 20k limit (that being said, 20k is barely perceptible, and with 2.5k to 5k still registering in the “higher pitch” areas, 10k and 12k being piercingly high, 20k is not that useful of a frequency when it comes to mixing audio).

Debate and Conclusion

In my experience, bit rate (16 bit vs 24 bit) oftentimes make more difference to the audio than the sample rate (e.g. 44.1k vs 48k)

For some material 96k may or 192k may sound a bit better, but the enormous file size is not worth it. Material will eventually get downsampled to mp3, and there are other things like final gain staging that matter more. Check your file output gain with a meter and make sure you aren’t clipping, and leaving a bit of headroom for the converters and downsampling.

So why use 96k at all? By ensuring frequencies up to 40k are covered, we have a very accurate version of the sound that can be use to mix down to 44.1k or something more reasonable. Most plugins can use 96k, and most listening tests DO show quality difference between 44.1k and 96k. 192k can be used or even higher, but that could be debated as far as perceptible quality vs file size. Try it out for yourself.

There are literally hundreds of factors that go into the sound of the recording. Microphone choice and placement, pre-amps, converter quality and anti-aliasing, clock jittering, as well as physical environment you record in all play a role making much greater changes to the sound than 48k vs 96k or 44.1 vs 48k. So just check that it is reasonable, and don’t worry about it too much.