Audio signal processing

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on the digital representation of that signal.

History

Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals is typically measured in decibels. Audio processing was necessary for early radio broadcasting, as there were many problems with studio to transmitter links.^[1]

Analog signals

Analog indicates something that is mathematically represented by a continuous function. Thus, an analog signal is one represented by a continuous stream of data, in this case along an electrical circuit in the form of voltage or current. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via various electrical means.

Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable and digital signal processing has become the method of choice.

Digital signals

A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors, microprocessors and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing.^[2]

Application areas

Processing methods and application areas include storage, data compression, music information retrieval, speech processing, localization, acoustic detection, transmission, noise cancellation, acoustic fingerprinting, sound recognition, synthesis, and enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.).

Audio broadcasting

Arguably the most important audio processing in audio broadcasting takes place just before the transmitter. The audio processor here must

prevent or minimize overmodulation,
compensate for non-linear transmitters (a potential issue with medium wave and shortwave broadcasting) and
adjust overall loudness to desired level.

Active noise control

Active noise control is a technique designed to reduce unwanted sound. By creating a signal that is identical to the unwanted noise but with the opposite polarity, the two signals cancel out due to destructive interference.

Audio synthesis

Audio synthesis is the electronic generation of audio signals. A musical instrument that accomplishes this is called a synthesizer. Synthesizers can either imitate sounds or generate new ones. Audio synthesis is also used to generate human speech using speech synthesis.

Audio effects

Audio effects are systems designed to alter how an audio signal sounds. Unprocessed audio is metaphorically referred to as "dry", while processed audio is referred to as "wet".^[3]

echo - to simulate the effect of reverberation in a large hall or cavern, one or several delayed signals are added to the original signal. To be perceived as echo, the delay has to be of order 35 milliseconds or above. Short of actually playing a sound in the desired environment, the effect of echo can be implemented using either digital or analog methods. Analog echo effects are implemented using tape delays and/or spring reverbs. When large numbers of delayed signals are mixed over several seconds, the resulting sound has the effect of being presented in a large room, and it is more commonly called reverberation or reverb for short.
flanger - to create an unusual sound, a delayed signal is added to the original signal with a continuously variable delay (usually smaller than 10 ms). This effect is now done electronically using DSP, but originally the effect was created by playing the same recording on two synchronized tape players, and then mixing the signals together. As long as the machines were synchronized, the mix would sound more-or-less normal, but if the operator placed his finger on the flange of one of the players (hence "flanger"), that machine would slow down and its signal would fall out-of-phase with its partner, producing a phasing effect. Once the operator took his finger off, the player would speed up until its tachometer was back in phase with the master, and as this happened, the phasing effect would appear to slide up the frequency spectrum. This phasing up-and-down the register can be performed rhythmically.
phaser - another way of creating an unusual sound; the signal is split, a portion is filtered with an all-pass filter to produce a phase-shift, and then the unfiltered and filtered signals are mixed. The phaser effect was originally a simpler implementation of the flanger effect since delays were difficult to implement with analog equipment. Phasers are often used to give a "synthesized" or electronic effect to natural sounds, such as human speech. The voice of C-3PO from Star Wars was created by taking the actor's voice and treating it with a phaser.
chorus - a delayed signal is added to the original signal with a constant delay. The delay has to be short in order not to be perceived as echo, but above 5 ms to be audible. If the delay is too short, it will destructively interfere with the un-delayed signal and create a flanging effect. Often, the delayed signals will be slightly pitch shifted to more realistically convey the effect of multiple voices.
equalization - different frequency bands are attenuated or boosted to produce desired spectral characteristics. Moderate use of equalization (often abbreviated as "EQ") can be used to "fine-tune" the tone quality of a recording; extreme use of equalization, such as heavily cutting a certain frequency can create more unusual effects.
filtering - Equalization is a form of filtering. In the general sense, frequency ranges can be emphasized or attenuated using low-pass, high-pass, band-pass or band-stop filters. Band-pass filtering of voice can simulate the effect of a telephone because telephones use band-pass filters.
overdrive effects such as the use of a fuzz box can be used to produce distorted sounds, such as for imitating robotic voices or to simulate distorted radiotelephone traffic (e.g., the radio chatter between starfighter pilots in the science fiction film Star Wars) . The most basic overdrive effect involves clipping the signal when its absolute value exceeds a certain threshold.
pitch shift - this effect shifts a signal up or down in pitch. For example, a signal may be shifted an octave up or down. This is usually applied to the entire signal, and not to each note separately. Blending the original signal with shifted duplicate(s) can create harmonies from one voice. Another application of pitch shifting is pitch correction. Here a musical signal is tuned to the correct pitch using digital signal processing techniques. This effect is ubiquitous in karaoke machines and is often used to assist pop singers who sing out of tune. It is also used intentionally for aesthetic effect in such pop songs as Cher's Believe and Madonna's Die Another Day.
time stretching - the complement of pitch shift, that is, the process of changing the speed of an audio signal without affecting its pitch.
resonators - emphasize harmonic frequency content on specified frequencies. These may be created from parametric EQs or from delay-based comb-filters.
robotic voice effects are used to make an actor's voice sound like a synthesized human voice.
modulation - to change the frequency or amplitude of a carrier signal in relation to a predefined signal. Ring modulation, also known as amplitude modulation, is an effect made famous by Doctor Who's Daleks and commonly used throughout sci-fi.
compression - the reduction of the dynamic range of a sound to avoid unintentional fluctuation in the dynamics. Level compression is not to be confused with audio data compression, where the amount of data is reduced without affecting the amplitude of the sound it represents.
3D audio effects - place sounds outside the stereo basis
reverse echo - a swelling effect created by reversing an audio signal and recording echo and/or delay while the signal runs in reverse. When played back forward the last echos are heard before the effected sound creating a rush like swell preceding and during playback. Jimmy Page of Led Zeppelin used this effect in the bridge of "Whole Lotta Love".^[4]^[5]^[6]
wave field synthesis - a spatial audio rendering technique for the creation of virtual acoustic environments

References

↑ Atti, Andreas Spanias, Ted Painter, Venkatraman (2006). Audio signal processing and coding ([Online-Ausg.] ed.). Hoboken, NJ: John Wiley & Sons. p. 464. ISBN 0-471-79147-4.
↑ Zölzer, Udo (1997). Digital Audio Signal Processing. John Wiley and Sons. ISBN 0-471-97226-6.
↑ Hodgson, Jay (2010). Understanding Records, p.95. ISBN 978-1-4411-5607-5.
↑ "WHOLE LOTTA LOVE by LED ZEPPELIN". Retrieved 5 January 2018.
↑ O'Neil, Bill. "Page's Studio Tricks III (Backwards echo)". Retrieved 5 January 2018.
↑ Audrey. "The History of Reverse Reverb". Retrieved 5 January 2018.