Audio Formats

The Formats

Wave (or WAV)

The standard audio file format for the PC is the wave file. There are different quality settings of wave files, from CD quality stereo, down to telephone quality in mono, and many combinations in between. These sounds can be played using any of the following programs: Windows Media Player, Sound Recorder, or the QuickTime Player. The file extension is .wav

Example - hello.wav, size:310kb

AIF/AIFF

The equivalent sound file on the Macintosh is AIF/AIFF and it has similar quality settings to the wave files. The QuickTime Player and Windows Media Player 7.1 for Macintosh will play these files. The file extension is .aif or .aiff

Example - RwWhistle.aiff, size:56kb (a Red-winged Blackbird)

MP3

A widely popular format today is the MP3 format. Originally this format was conceived to be audio and video, however, the video portion was dumped and the MPEG-1 Layer 3 audio format was born. These files are popular because they are compressed (smaller) in comparison to wave files, but the sound quality can be virtually indistinguishable. This makes them good for transferring over the internet because of their relatively small size. These files can be played using Windows Media Player, Musicmatch, WinAmp, or QuickTime. The file extension is .mp3

Example1 - hello.mp3, size:30kb (the same sound as the wave file from above, but 1/10 the size- that's MP3 compression!)

Example2 - speed_128.mp3, size:697kb (a 45 second "song" at the 128kbps bit rate)

Real

There are other compressed audio formats. The first one is known as Real Audio from Real.com. You need the Real Audio player to play these files. The company pushes their RealOne player, but it can be so annoying in its pop-up ads and windows that it takes away from the music experience. If you can, get the Real Player version 8. The file extensions are .ra, .rm, or .ram

Example - hello.rm, size:23kb (the same sound as above)

Windows Media

Another type of compressed audio comes from Microsoft. The Windows Media Audio format is another direct competitor to MP3. Microsoft claims that this format is CD quality at half the size of MP3. Well not quite. When audio files are compared at the same bit rate, or the measure of how many bits represent a given sound, the files are essentially the same in terms of quality and file size. Windows Media Player plays Windows Media Audio files. Microsoft has just released their Media Player 9 Series player. The file extension is .wma

Example - hello.wma, size: 43kb (the same sound as above)

MIDI

MIDI is a file format unlike any of the others we have mentioned. It isn't really a sound file, but rather a description of a sound file. It relies on your sound card to interpret the data and play the song. The higher the quality of MIDI interpreter on your sound card, the more realistic the sound will be. Since it is only a description and not a full sound, it is a relatively small file. Windows Media Player, QuickTime, along with countless other programs will play MIDI files. The file extension is .mid

Example - getting_to_know.mid, size:13kb

BIT

What is a BIT ?

A bit is a 0 or a 1 since a computer is pretty much just millions of switches a computer works with ON and OFF states, this is called BINARY since it is numbers in a Base 2 format.

binary in 2 BITS our base 10 numerals
00 0
01 1
10 2
11 3

In this table you see some 2 bit words and their equalant numbers in the BASE that we are used to seeing numbers in. They are two bit words since they have two ON or OFF states. I will stray from this for a little while to explain the audio side for a while....

Dynamic range of 1 BIT ?

Digital Theory, what is a bit, sample and dynamic range of digital audio.

In this picture we see I'm not a very good artist LOL.. besides this I have tried to show that BITS are the Y axis on a graph and that each BIT encodes 6 DB of DYNAMIC RANGE.... The X axis is of course TIME which is set by the sample rate.

Most people know how a television works. A dot is painted onto a screen, the dot moves very fast across the screen painting the picture. Remember how when a bright light is shone in to your eyes ?? After the light has been turned off you can still see a blurry dot, this is called persistence of vision. This allows the brain to see the fast moving dot on a TV screen as a full picture !! dogs don't have persistence of vision and cannot see pictures on a TV ! The refresh rate of a TV must be more than 30-40 hertz for a picture to be shown (that's why mains power is 50 Hz, so light bulbs don't flicker). In other words more than 30 - 40 pictures/snap shots must be painted for our eyes to see a constant picture... Getting back to audio now. Samples are like snapshots of sound and just like early Projection Movies in black and white if they are not played back at a fast enough speed your ears will hear the gap. Not sure what the speed of the ear is but if someone yells in to your ear you hear a ringing so you do have persistence of hearing. More on sample rates later on... Back to BITS....

A 8BIT word looks like this "01001000" .... Now as we add an extra BIT to a word length we DOUBLE the possible combinations, in terms of audio we double the quantisation values or the number of Y values we can round off the level of the audio to. In digital realms you cant have 1 and a HALF it either has to be 1 or a 2... Hence the term quantisation or rounding. The level of the audio has to be rounded to the nearest value allowed in a BIT.

Quality of 8 bit VS 16 bit and 24 bit ?

Number of BITS example of the BIT Number of Quantisation levels
2 01 4
3 110 8
.... ..... .....
8 00101011 256
16 0000110001010010 65536
20 ........... 1048576
24 ............................... 16777216

As you can see by this table by going from 16 BIT audio to 24 BIT audio, we have gained 256 TIMES the accuracy of the word lengths (samples). That's why all professionals will record in 20 bits or more. On a side note, most professionals will record in the same sample rate that the final product will be mastered to eg (44.1) even when they have the capability of recording in 96k or even beyond that. There are many reasons why which I wont go into here. This is now changing as DVD allows for higher sampling rates and capturing the original sound in higher sampling rates leaves the engineer more flexibility later on.

How large is a BIT ?

8 BITS make up 1 Byte of storage.. 16 BITS take up two bytes of storage and 24 BITS take up 3 bytes of storage. This is generally true, but there are some exceptions. I wont make it more complicated by going into this in more depth.

What is a dB (DeciBel) ?

dB is a deci Bel.... 10 deciBels make up one Bel... One decibel is approximately equal to the smallest change in volume of sound that the normal ear can detect. The scale of decibels is logarithmic, every increase of 10 dB representing an increase of about 300% in sound. The deciBel is a LOGARITHMIC scale and you cannot treat them like normal values when adding and subtracting two values.

Dynamic Range of Digital Audio

What is Dynamic range

Dynamic range represents the difference between the maximum signal that can be recorded (0dB / DFS) and the noise floor of your system. The noise floor is the noise present in your system without any signal present. A system with a high dynamic range will be quieter than one with a
lower dynamic range. Dynamic range is measured with the decibel (dB).

What's the dynamic range of 16 bit and 24 bit audio

1 Bit can encode 6dB of Dynamic range. Therefore a 24-bit system theoretically has a dynamic range of 144dB (24 * 6 = 144) and a 16-bit
system has a theoretical dynamic range of 96dB.

Why don't converters have 96dB or 144dB range ?

Current analog-to-digital converters typically produce a full-scale input voltage with an input of +7dBu. If they were to have 144dB of dynamic range, they would have to be capable of resolving signals as small as 10 nano-volts. That’s 10 one-billionths of a volt! Transistors and resistors produce noise in this range just by having electrons moving around due to heat. Even if the converters could be perfectly designed to read these levels, the low noise requirements of the surrounding circuitry such as power supplies and amplifiers would be so stringent that they would either be impossible or too expensive to build.

An average RMS of 120dB dynamic range in 24bit converters is about as good as it gets to this date with mass produced converters.

Nyquist Theory

To sample or graph a SINE wave you must have at least two points or co-ordinates in order to guess what the frequency is.. For example you need the ORIGIN (y=0 normally) and either a MAXIMUM or a MINIMUM value to guess the frequency. Because of this simple fact, to record a frequency you must have at least double that number as the sampling rate.. EG. To record a sine wave of 50 Hz you need a MINIMUM of 100 samples per second to record the sine wave. This basic fact which governs the minimum sampling rate is called the Nyquist theory. The Nyquist frequency is the highest frequency that you can record with a given sample rate. In the case of a recording with 44,100 samples per second (the sampling rate of CDs) the Nyquist frequency is 22050 Hz. I could go in to drawing pictures to prove this but I wont because you have seen the quality of my drawings above. LOL. The Nyquist theory is not something that you will hear much about but is good to know what it is and how it effects things in real life situations.

What is a BEAT frequency ?

Guitarist use the beat frequency to tune guitars with harmonics. If a guitarist picks a harmonic on the guitar the string can only vibrate with waves corresponding to the length of the string, by placing a finger on the string at a particular fret you create what is called in physics as a Node and Anti-Nodes. When playing two harmonics that are very close together you hear a rise and a fall in the perceived level of the notes due to the BEAT frequency. I will now explain what the BEAT freq is.. The difference between any two frequencies will create a new frequency.. The formula is F1-F2=BEAT ..... Back to relating this to Dynamic range... The BEAT frequency in simple terms is for every Hz that you go over the Nyquist frequency you will get a artifact that equals the difference between the recorded frequency and the nyquist frequency.. WOW that is hard to explain in simple terms.. For example if u record a 22,051 Hz sine wave with a 44.1 Khz sample rate you will get a 1 Hz rumble in your audio due to going over the Nyquist frequency. If you record a 22,080 Hz you will get a 30 HZ rumble and so on. Once again I will spare you of the draws I could do to prove this with graphs.

Brick Wall filters in DA.

When mastering tracks to go on to CD's the material is EQ' ed so that nothing over 18-19 KHz or very little reaches the DA converters. Early CD's and CD players where said to be HARSH to the ears since they used What's called a BRICK WALL filter to cut all frequencies off after a set point, sometimes as low as 15KHz. I'm sure you have heard people complain about Cd's and say they are un-warm and harsh, this is true of cheap cd players and older models.. This abrupt cut off wasn't natural and the ears picked it up even though very few adults can hear past 16 KHz... I should expand on this or newbies to digital audio will argue on this point, whilst a human cannot hear above say 16 K we can sense if frequencies are present or not in recordings. High end CD players actually recreate the harmonics in the DA conversion right up to 30 KHz... Pioneer call this "Legato Link" technology if you wish to look it up. A new born baby can hear to 20 Khz, as the baby gets older the ears slowly loses this range. The more loud rock concerts you attend the less you will be able to hear high frequencies, every time you hear your ears ringing and they seem quite then you have done damage to your ears.. If the sound is filtered too much and too steeply the sound is very harsh and if it is not filtered enough you will get rumbles in the recordings. Some AD and DA use 180KhZ brick wall filters to help block RFI and EMI interference, this one way how internal audio cards are much quieter than cheap sound cards.

DITHERING

I wont go into too much depth about Dithering since many sites explain it already. Here's a link.

Advanced Dithering explanation

When Dithering from 24bit to 16 bit the information stored in the last 8 bits is moved into the top 16 bits which are the ones which we want to keep. Truncating then throws away the last 8 bits. If you truncate before dithering you lose some of the audio information you have recorded. IE You throw away some of your quality ! If you dither before truncating, your adding small amounts of random noise to the audio to push the audio information up into the top 16bits... When the digital audio is truncated most of the noise is thrown away, although some of it will be kept. Dithering gives you much smoother and pleasant audio to listen to after you have reduced the word-length of the audio. Read Noise shaping to learn about advanced dithering techniques.

Noise Shaping

Noise shaping is dithering but taking in to account the Fletcher Munson graphs. These Fletcher Munson graphs show the areas where the human ear is most sensitive and where it is also least sensitive to certain audio frequencies... By only adding noise in the areas where our ears cannot hear as well, or our ears cannot hear at all, the noise that is added in dithering is pretty much completely inaudible. This is made even truer as dither is normally around 90db below the maximum level of material in 16 bit audio. There are many different noise shaping techniques in use and depending on the recorded material a different one may be better than another one. That's one reason why mastering should be left to professional studios who can dither properly and know which noise shaping technique will work best for your material.

How much dither noise is added ?

Very small amounts are added.

When calculating signal levels and comparing to dB values you must use this formula because the decibel is a logarithmic scale.

N(dB) = 20(LOG A - LOG B)

This is as far as I am going to go in this tute for now..... Hope u learnt heaps and understood most if not all.. Any comments about this feel free to email me.

Digital Audio

What is sound?

Sounds are pressure waves of air. If there wasn't any air, we wouldn't be able to hear sounds. There's no sound in space.

We hear sounds because our ears are sensitive to these pressure waves. Perhaps the easiest type of sound wave to understand is a short, sudden event like a clap. When you clap your hands, the air that was between your hands is pushed aside. This increases the air pressure in the space near your hands, because more air molecules are temporarily compressed into less space. The high pressure pushes the air molecules outwards in all directions at the speed of sound, which is about 340 meters per second. When the pressure wave reaches your ear, it pushes on your eardrum slightly, causing you to hear the clap.

A hand clap is a short event that causes a single pressure wave that quickly dies out. The image above shows the waveform for a typical hand clap. In the waveform, the horizontal axis represents time, and the vertical axis is for pressure. The initial high pressure is followed by low pressure, but the oscillation quickly dies out.

The other common type of sound wave is a periodic wave. When you ring a bell, after the initial strike (which is a little like a hand clap), the sound comes from the vibration of the bell. While the bell is still ringing, it vibrates at a particular frequency, depending on the size and shape of the bell, and this causes the nearby air to vibrate with the same frequency. This causes pressure waves of air to travel outwards from the bell, again at the speed of sound. Pressure waves from continuous vibration look more like this:

How is sound recorded?

A microphone consists of a small membrane that is free to vibrate, along with a mechanism that translates movements of the membrane into electrical signals. (The exact electrical mechanism varies depending on the type of microphone.) So acoustical waves are translated into electrical waves by the microphone. Typically, higher pressure corresponds to higher voltage, and vice versa.

A tape recorder translates the waveform yet again - this time from an electrical signal on a wire, to a magnetic signal on a tape. When you play a tape, the process gets performed in reverse, with the magnetic signal transforming into an electrical signal, and the electrical signal causing a speaker to vibrate, usually using an electromagnet.

How is sound recorded digitally ?

Recording onto a tape is an example of analog recording. Audacity deals with digital recordings - recordings that have been sampled so that they can be used by a digital computer, like the one you're using now. Digital recording has a lot of benefits over analog recording. Digital files can be copied as many times as you want, with no loss in quality, and they can be burned to an audio CD or shared via the Internet. Digital audio files can also be edited much more easily than analog tapes.

The main device used in digital recording is a Analog-to-Digital Converter (ADC). The ADC captures a snapshot of the electric voltage on an audio line and represents it as a digital number that can be sent to a computer. By capturing the voltage thousands of times per second, you can get a very good approximation to the original audio signal:

Each dot in the figure above represents one audio sample. There are two factors that determine the quality of a digital recording:

  • Sample rate: The rate at which the samples are captured or played back, measured in Hertz (Hz), or samples per second. An audio CD has a sample rate of 44,100 Hz, often written as 44 KHz for short. This is also the default sample rate that Audacity uses, because audio CDs are so prevalent.

  • Sample format or sample size: Essentially this is the number of digits in the digital representation of each sample. Think of the sample rate as the horizontal precision of the digital waveform, and the sample format as the vertical precision. An audio CD has a precision of 16 bits, which corresponds to about 5 decimal digits.

Higher sampling rates allow a digital recording to accurately record higher frequencies of sound. The sampling rate should be at least twice the highest frequency you want to represent. Humans can't hear frequencies above about 20,000 Hz, so 44,100 Hz was chosen as the rate for audio CDs to just include all human frequencies. Sample rates of 96 and 192 KHz are starting to become more common, particularly in DVD-Audio, but many people honestly can't hear the difference.

Higher sample sizes allow for more dynamic range - louder louds and softer softs. If you are familiar with the decibel (dB) scale, the dynamic range on an audio CD is theoretically about 90 dB, but realistically signals that are -24 dB or more in volume are greatly reduced in quality. Audacity supports two additional sample sizes: 24-bit, which is commonly used in digital recording, and 32-bit float, which has almost infinite dynamic range, and only takes up twice as much storage as 16-bit samples.

Playback of digital audio uses a Digital-to-Analog Converter (DAC). This takes the sample and sets a certain voltage on the analog outputs to recreate the signal, that the Analog-to-Digital Converter originally took to create the sample. The DAC does this as faithfully as possible and the first CD players did only that, which didn't sound good at all. Nowadays DACs use Oversampling to smooth out the audio signal. The quality of the filters in the DAC also contribute to the quality of the recreated analog audio signal. The filter is part of a multitude of stages that make up a DAC.

How does audio get digitized on your computer?

Your computer has a soundcard - it could be a separate card, like a SoundBlaster, or it could be built-in to your computer. Either way, your soundcard comes with an Analog-to-Digital Converter (ADC) for recording, and a Digital-to-Analog Converter (DAC) for playing audio. Your operating system (Windows, Mac OS X, Linux, etc.) talks to the sound card to actually handle the recording and playback, and Audacity talks to your operating system so that you can capture sounds to a file, edit them, and mix multiple tracks while playing.

Standard file formats for PCM audio

There are two main types of audio files on a computer:

  • PCM stands for Pulse Code Modulation. This is just a fancy name for the technique described above, where each number in the digital audio file represents exactly one sample in the waveform. Common examples of PCM files are WAV files, AIFF files, and Sound Designer II files. Audacity supports WAV, AIFF, and many other PCM files.

  • The other type is compressed files. Earlier formats used logarithmic encodings to squeeze more dynamic range out of fewer bits for each sample, like the u-law or a-law encoding in the Sun AU format. Modern compressed audio files use sophisticated psychoacoustics algorithms to represent the essential frequencies of the audio signal in far less space. Examples include MP3 (MPEG I, layer 3), Ogg Vorbis, and WMA (Windows Media Audio). Audacity supports MP3 and Ogg Vorbis, but not the proprietary WMA format or the MPEG4 format (AAC) used by Apple's iTunes.

For details on the audio formats Audacity can import from and export to, please check out the Fileformats page of this documentation. Please remember that MP3 does not store uncompressed PCM audio data. When you create an MP3 file, you are deliberately losing some quality in order to use less disk space

basic audio

Introduction to Audio

SoundThis beginner-level tutorial covers the basics of audio production. It is suitable for anyone wanting to learn more about working with sound, in either amateur or professional situations. The tutorial is five pages and takes about 20 minutes to complete.

What is "Audio"?

Audio means "of sound" or "of the reproduction of sound". Specifically, it refers to the range of frequencies detectable by the human ear — approximately 20Hz to 20kHz. It's not a bad idea to memorise those numbers — 20Hz is the lowest-pitched (bassiest) sound we can hear, 20kHz is the highest pitch we can hear.

Audio work involves the production, recording, manipulation and reproduction of sound waves. To understand audio you must have a grasp of two things:

  1. Sound Waves: What they are, how they are produced and how we hear them.
  2. Sound Equipment: What the different components are, what they do, how to choose the correct equipment and use it properly.

Fortunately it's not particularly difficult. Audio theory is simpler than video theory and once you understand the basic path from the sound source through the sound equipment to the ear, it all starts to make sense.

Technical note: In physics, sound is a form of energy known as acoustical energy.

The Field of Audio Work

The field of audio is vast, with many areas of specialty. Hobbyists use audio for all sorts of things, and audio professionals can be found in a huge range of vocations. Some common areas of audio work include:

  • Studio Sound Engineer
  • Live Sound Engineer
  • Musician
  • Music Producer
  • DJ
  • Radio technician
  • Film/Television Sound Recordist
  • Field Sound Engineer
  • Audio Editor
  • Post-Production Audio Creator

In addition, many other professions require a level of audio proficiency. For example, video camera operators should know enough about audio to be able to record good quality sound with their pictures.

Speaking of video-making, it's important to recognise the importance of audio in film and video. A common mistake amongst amateurs is to concentrate only on the vision and assume that as long as the microphone is working the audio will be fine. However, satisfactory audio requires skill and effort. Sound is critical to the flow of the programme — indeed in many situations high quality sound is more important than high quality video.

Most jobs in audio production require some sort of specialist skill set, whether it be micing up a drum kit or creating synthetic sound effects. Before you get too carried away with learning specific tasks, you should make sure you have a general grounding in the principles of sound. Once you have done this homework you will be well placed to begin specialising.

The first thing to tackle is basic sound wave theory...