Audio recording in theory

From Canadian Centre for Ethnomusicology
Jump to: navigation, search

Recording sound in theory

General properties of equipment components

  • Dynamic range: ratio between maximal and minimal signals (given noise)
  • Frequency response: ratio of output power to input power across range of frequencies, with constant input power
  • Noise
  • Weight and size
  • Durability
  • Power requirements
  • Cost


  • Electrical specifications
    • Unbalanced line: noisy
      • Two-wire cable: hot, and ground (shield)
      • hot - ground = signal + noise
      • Advantage: cheap, simple, fine for short distances
      • Disadvantage: noise may be introduced via stray electromagnetic fields on long cables
    • Balanced line: no noise!
      • Three-wire cable: hot, cold, and ground (shield)
      • Signal is encoded on both hot and cold
      • hot - ground = signal/2 + noise
      • cold - ground = - signal/2 + noise
      • (hot - ground) - (cold - ground) = hot - cold = signal
      • Take care: the same pin must be hot or cold, else signals may cancel!
    • Understand the issue with a number game:
      • A wants to tell C a number, X by whispering to B who tells C. But B will always add a fixed number N (noise), unknown to C. C hears X+N and therefore can't know the original number!
      • Trick: B must add the same number N to all whispered numbers.
      • So A tells B the number X/2 as well as its negative -X/2. C gets X/2 + N and -X/2 + N, and subtracts them. Voila: X appears!
  • Connectors and connections
    • Balanced lines are carried by XLR connectors (3 pin/wire)
    • Unbalanced lines are carried by quarter inch, miniplug, or RCA connectors (2 contacts)
    • Stereo versions add an extra signal, thus 5-6 wire (balanced) or 3-4 wire (unbalanced), depending on whether ground is repeated.
    • Thus quarter inch: tip, ring, sleeve (ground)
    • Be sure ground is connected to ground


The microphone is a kind of transducer, converting pressure waves into electrical signals by means of a diaphragm that vibrates with incident sound waves, generating an electrical current. The output is usually a low voltage (thousandths of a volt), which must be boosted to line level (in the range of about 0.5 volts to 2 volts).

  • Induction technology
    • Condenser (capacitor) microphone (capacitance change)
      • Advantage: stronger signal, better frequency response
      • Disadvantage: requires power (battery or phantom), more delicate, poor choice for loud sound
    • Dynamic microphone (electromagnetic induction, like a reversed speaker).
      • Advantage: Simpler, sturdier, fewer moving parts, no power required, good for loud sounds
      • Disadvantage: Tend not to have flat frequency response, weak signals
    • Piezoelectric microphone (piezoelectric generation)
  • Directional pattern
    • Omnidirectional
    • Cardiod (front and sides)
    • Bidirectional
    • Hypercardiod
    • Shotgun (too conspicuous!)
    • Variable (some microphones provide a switch, e.g. to match video zoom)
  • Special types
    • Lavelier mic
    • Wireless mic
    • Contact mic
    • Stereo mic
    • Digital mic: converts straight to digital, and plugs into your computer often using a USB slot. (Convenient, but not the best A/D conversion.)
  • Connectors
    • XLR (professional): larger, but shielded, balanced, better connection
    • TRS (tip-ring-sleeve, both quarter-inch and miniature): not shielded, unbalanced
    • RCA: not shielded, unbalanced
    • USB
  • Impedance
    • Low: under 600 ohms (most condenser mics are under 200 ohms)
    • High: over 10k ohms
    • Low impedance is preferred. High impedance will not perform well with longer cables.
    • Connect mic to system of equal or higher impedance
  • Frequency response
    • Plot of dB vs Hertz (input signal), where dB measures the ratio of output to input
    • Flat is generally best, but at least match to expected frequency range
    • Condenser responses are generally flatter than dynamic
  • Dynamic range
    • essentially ratio between loudest sound to noise floor
  • Accessories
    • Windscreen
    • Mic mount (reduces vibration)
    • Stand


  • The speaker is the microphone's opposite in the record/playback chain.
  • Applications:
    • Feedback to informants
    • Transcription
    • Editing
  • Headphones vs. monitors

Digital recording

  • Analog recordings can be subsequently digitized.
  • To make a real-time digital recording, an analog mic signal passes through an A/D unit (sometimes this is contained inside the mic itself), to be stored in digital media.

Digital audio data rates

The net bit rate for audio is a function of

  • bit depth
  • sampling rate
  • CODEC compression(if any)

Very roughly speaking, remember that an audio CD (uncompressed audio) is around 650 MB; a high quality MP3 version is about a tenth that size, while the highest quality audio is 2x-3x bigger.

Remember to differentiate units (bit rate vs. byte rate).

For some sample and illustrative calculations see: File size calculations.

Recorder type

  • Computer to HD
  • Dedicated hard drive recorder
  • Flash memory (SD cards, etc.) recorder
  • Minidisc, Hi Minidisc


  • AD converter: quality important (use outboard converter)
  • Sampling rate
    • Sampling rate should be greater than twice the highest signal frequency
    • aliasing: high frequencies appear as low frequencies, because they're undersampled.
    • Nyquist frequency (= half the sample rate) is the highest frequency that can be sampled without aliasing problems.
    • Therefore A/D converters must filter before sampling.
    • Human hearing is limited to about 20kHz, which means the signal can be filtered to 20 kHz max without much perceptual loss.
    • In this case sampling rate must exceed 40 kHz. Typical rates:
      • 44.1 KHz (used for CDs)
      • 48 KHz
      • 96 KHz
      • 192 KHz
  • Bit depth
    • More bits increases dynamic range, decreases quantization noise, but increases file size
    • 8 bits (49.93 dB SQNR)
    • 16 bits (98.09 dB) (used for CDs)
    • 20 bits (122.17 dB)

Setting levels

  • Automatic level control
  • Limiter
  • Manual level control
  • Danger: clipping!

Uncompressed audio

Use Pulse Code Modulation (PCM) to encode the analog signal

  • wav (pc)
  • aiff (mac)
  • bwav (incorporates metadata)

Lossless compressed audio

  • flac

Lossy compressed audio

  • mp3
  • ogg
  • mp4
  • wma

Considerations when designing a recording system

  • Cost
    • Cost of equipment
    • Cost of repairs
    • Cost of supplies, per minute of recording
    • Cost of power, per minute (batteries)
  • Quality (Signal/noise, distortion)
    • S/N ratio
    • Dynamic range
    • Frequency response
    • Other sources of distortion
  • Durability
    • Durability of equipment (given environmental problems: heat, dust, moisture, light, shock, etc.)
    • Stability of the medium, durability of the recording itself
  • Feasibility (practicality)
    • Portability of recorder (weight & size)
    • Compatibility of equipment types (e.g. computer and recorder)
    • Conspicuousness (small is less, but may make less good recordings), e.g. shotgun mic may be technically useful, socially useless
    • Practicality of supplies (availability)
    • Practicality of power source (availability, voltage, etc.)
    • Ease of working with the medium (copying, transferring, editing, etc.)
    • Amount of storage (tape, HD, etc.) required (tape length, total volume/weight of tape required)
    • Duplication ease and generational loss
    • Able to handle range of situations (interview, music, inside, outside, etc.)
    • Ability to combine video recording a plus!