Galaxy Raver • View article - Build a Sound System - 1B

This article was last edited by dreamdust on Sat Sep 12, 2009 11:38 am

This second section will investigate methods of creating/manipulating audio for the purpose of amplification.

Audio sources - string, reed, percussive, brass, voice and synthetic.

We create and manipulate audio in various ways, but all have the common goal of creating audio waves that are pleasing to listen to (the subjective nature of pleasure, and dissonance / consonance is another subject entirely, and not one I am going to discuss here. What we define as pleasure notwithstanding, pleasure is the common goal of music in all it's forms.) Over the centuries, we have devised ever more complex and intricate methods of creating this audio, and manipulating the physical world to transmit it. It is generally accepted that the first instrument used for creating sound was percussive in nature, so we will start here:

The drum.

This category, which basically includes any instrument of a percussive nature (from 2 sticks banged together all the way to a full orchestral percussion section) work by using the elastic properties of a solid to set up a transverse waveform. This is usually accomplished by hitting with another solid (usually a drumstick of some form). The resulting deformation of the surface causes a transverse "ripple" to radiate from the point of impact, the result of which is a longitudinal pressure variation in the air perpendicular to the surface (a sound wave). The tonal characteristics etc are altered by the nature of the materials used, wether they are fixed at the ends/edges (as with a drum), or free (as in a Glockenspiel), and a number of other factors which are discussed here. As with most physical laws, they are essentially quite straightforward when taken singly, but become very complex very quickly when interacting (which is one of the reasons they are so difficult to synthesize properly - something we will touch on later). This use of a transverse deformation of a solid to create a longitudinal pressure variation is the basis for pretty much all acoustic sound though.

String, Voice and Brass.

I am putting these three methods into one section because they create sound waves in very similar ways (the minor differences in the method of creation of the sound are added to the differences in resonation chambers etc to produce very different tonal characteristics, but the original creation of the sound waves is very similar). Basically they take a solid vibrator that is fixed at both ends, and cause it to produce transverse waves over it's length. With string instruments, it is a taught string, the tone varied by the taughtness and length of the string. In the human voice, the twin infoldings of mucous membrane at the base of the larynx, known as the vocal folds, or "vocal chords" do the job, the tone produced varied by the tension created by the muscles on either side. With brass instruments, the vibration is caused by air passing through the lips of the player, the tone varied by the tension of the lips. These similar beginnings then have sound added, taken away, distorted etc by various methods (from the interaction of waves reflecting at both fixed ends, to the properties of the various resonating chambers used, to the differences inherent in the materials causing the sound) to produce vastly different sounds. So much so that you would hardly credit a guitar and a trumpet with having anything in common other than that both make a noise - but as noted earlier, the physical laws - essentially straightforward when taken singly become very complex very quickly, and are capable of creating huge variation from similar beginnings.

Reed, or Woodwind.

Quite similar to the method of creating sound waves used by String, Voice and Brass, the main difference (in initial creation of the wave) in a "reed" type instrument is that the vibrating membrane is only fixed at one end, and therefore has different properties to one fixed at both ends. Note that although instruments like the Flute do not have reeds as such, the vibration for the sound is initiated by what is referred to as edge tone, and the mechanism is similar to that of a reed.

With all these methods, the variations in characteristics which make one so totally different to the others begin with the interactions that the incident and reflected transverse waves cause with each other, and are then modified by a varied set of resonating chambers etc to produce huge variation in the produced characteristics. My purpose in indicating the similarities is not to claim all instruments sound the same (far from it), but to point out that the huge variations are created from a similar source - a fact that is important both to the understanding of synthesized sound, and to the effective amplification of sound. The action of sounding boards, resonant chambers, resonating tubes and horns will be discussed further in section 1C.

Electrically amplified sound, and Synthetic sound.

While electrically amplified sound is not a sound source in itself, a very basic understanding of it is necessary for the understanding of Synthetic or Synthesized sound (we will go into it in detail in later sections). For the purposes of understanding synthesized sound, you need to understand how audio is transmitted electronically. The basic form uses a diaphragm to drive a piston, at the end of which is a material that exhibits Piezoelectric characteristics, thus converting the kinetic motion of the sound into an electrical representation of the same wave.

This representation (or analogue) of the sound can then be transmitted and amplified electronically before being converted back to kinetic energy (the sound wave) through the use of a speaker driver of some sort. The audio wave information is not actually altered, but the form of energy used for transmission is changed (which is why it is referred to as an analogue of the sound - the signal is not comparable, but the information is represented by a continuously variable quantity which can be measured and therefore converted back). The important fact is that it needs to be converted to kinetic energy to be heard - you cannot hear an electrical signal.

This ability to "encode" audio in electrical impulses, and then convert back to kinetic audio led to the invention of Synthesis - basically the creation of an oscillation in the electrical domain, which can then be manipulated by interacting with other oscillations, filtered, amplified and generally played with to create unique sounds unobtainable by normal acoustic means. There are many methods of synthesis, but all are based on causing these electrical impulses to interact in some way. It is interesting to note that while it is impossible to recreate some of the sounds of a synthesizer by acoustic means, it is also impossible (or at least extremly difficult) to recreate an acoustic instrument in any satisfactory way by synthesized means - the most effective method employed to date has been to record the sounds made by the instrument, and then play them back (hardly really "synthesis" as such). Actually creating from scratch something that even resembles the acoustic instrument requires an enormous amount of processing power to even come close (and even then, the results are not particularly satisfactory, and tend towards sounding "contrived"). The reasons for this, apart from the aforementioned complexity caused by the interactions of the waves in the oscillating membrane (be it drum skin, vocal chord, reed, string or lips) stem from the action of the resonator used to amplify/alter the sound, and are discussed in the next section.

Acoustic amplification - the soundboard, and various forms of resonant chambers, tubes and horns.

The property of Resonance is one that is fundamental to audio, and is the main cause of the huge variation in characteristics of acoustic instruments. The basic premise is that an object has a frequency at which it is easy to excite vibration, determined by the physical parameters of the object, and known as the resonant frequency. Most objects will usually have several resonant frequencies, and any complex excitation will cause it to vibrate at those frequencies, thereby effectively filtering out the non-resonant frequencies (the non-resonant frequencies are hard to excite by comparison, and so die away quickly from the original complex excitation, while the resonant ones sustain easily).

A pendulum is an example of an oscillation with a single frequency of resonance - it is easy to increase the amplitude of the oscillation if you time pushing the pendulum properly, but very difficult outside that timing - that timing is the resonant frequency (which can be changed by lengthening or shortening the length of the pendulum).

Most audio sources have multiple resonant frequencies (a fundamental, which is the main tone, and then several "harmonics" which are integer multiples of the fundamental). These harmonic resonances are formed by the action of standing waves - a characteristic pattern of resonance common to both string vibration, and air columns in which the combination of reflection and interference of the reflected waves with the incident waves cause the formation of "nodes" - ie reinforce each other to increase the overall amplitude at a particular frequency. In a string, the reflected vibration is flipped 180° in phase on reflection from a fixed end causing the string to appear to vibrate in segments - the fact that the vibration is made up of travelling waves is not apparent, hence the term "standing wave". For an air column, the phase is only flipped at an open end (due to the energy loss associated with encountering a lesser acoustic impedance). The closed end does not cause a phase change because the wave is encountering a greater acoustic impedance.

For an in depth description of all the various permutations involved in the property of resonance, go here. For our purposes, the main information you need to remember is that we can use resonance as a sort of natural amplifier by using the properties of chambers, construction materials etc. We can also mitigate some of the effects of resonance - which is particularly useful in speaker enclosure design where we need to spread the resonant curve of the driver to flatten out the peak associated with the free air resonance of the cone. This produces an uneven response at a particular frequency, and thus a less accurate conversion of the signal (we call the inaccuracy distortion, as it is a distortion of the original sound). We will come back to some of the specifics later, but reading the information at Hyperphysics is highly recommended, as resonance is a fundamental concept in audio amplification. A basic understanding will probably do, but a good understanding of the principles will help you understand many of the other concepts far more easily. If you plan to engineer any kind of band, it will also give you some understanding of the best ways to fit particular instruments into your sound spectrum (by allowing you to understand the sound produced and how it's overtones combine to create an instrument's sound signature - and thus which frequencies can be used in other places, and which are essential to the character of the instrument) - essential to creating a clear, good mix, in which all the instruments sit properly.

There are some terms associated with resonance that bear explanation, as they are not necessarily obvious in their definition -

For harmonics the resonant frequencies are integer multiples of the fundamental (ie whole number multiples of the fundamental frequency in Hertz - referred to as 1st harmonic, 2nd harmonic etc) - but we can also have non harmonic resonant frequencies (percussive membranes particularly exhibit this characteristic), where they are not integer multiples. For this, the term overtone becomes useful:
We use the term "overtone" to indicate a resonant frequency above the fundamental, so in a string resonance, or open air column resonance, because they exhibit the characteristic of all harmonics being resonant frequencies we say they have harmonic overtones. Closed air columns also produce only harmonic overtones (although only the odd harmonic values), and are also said to have harmonic overtones. A drum however, will have non harmonic overtones along with some harmonic and so are said to have non-harmonic overtones.

Audio Electronics 101 - Reproduction in analogue form.

The dictionary definition of analogue (the adjective) is:

1. (not comparable) (Of a device or system) in which the value of a data item (such as time) is represented by a continuously variable physical quantity that can be measured (such as the shadow of a sundial)

We already touched on the basics of creating an "analogue" of a sound in electrical form, so this section will be relatively brief. The basic concept of taking a kinetic input - which can be a raw sound wave (the example we used), or an oscillation produced by an instrument or device, and converting it into electrical impulses was the historical beginning of both PA reinforcement, and the recording of audio for later playback. The idea of creating an analogue of the audio wave using a different energy form to represent the original really began with the phonograph (from the greek root phono - meaning "sound", and graph - meaning "writer") - a device which encoded the representation of the sound wave onto a physical surface (in the form of a disc, or drum), by using a long track, or groove to represent time, and the indentations made in the groove to represent amplitude (there are various claims for who invented the first phonograph, but Thomas Edison seems to have the honour of patenting the first system that could replay the information as well as encode it).

This basis of using an analogue of a sound wave to record onto a physical medium is the basis of all sound recording techniques, and we can trace it's origin right back to the wind up organs of the renaissance (which used indentations in a drum to play a tune on a set of tuned metal strips). The use of electrical impulses to represent the sound wave is obviously later, and is the basis for all reinforcement techniques that aren't based around the configuration of venue acoustics (reinforcement of sound has a history stretching right back to ancient Greece, and the semicircular Theatre designs like the Dionysus Theatre in Athens. The culmination of knowledge in this area has resulted in designs like the Sydney opera house, and the Palau de la Musica in Barcelona amongst others - and the science of auditorium acoustics is as vital to an engineer as the science of amplification...)

The use of electricity as a medium for encoding the data required for an analogue representation allowed the use of further conversions (from electrical to magnetic for instance, as in the case of magnetic tape), and also allowed the encoding into higher frequency spectra, as in the case of radio waves which could then be transmitted over vast distances to be decoded back into the original sound.

Various techniques were also discovered to encode more information into the various analogue representations - the first devices were mono in operation, but stereo representations were invented, and quickly became the standard (humans have 2 ears, so stereo was a natural progression). These basically encode the audio into the groove walls - one on either side, using the lateral motion as well as the vertical motion to encode the electrical signals. The pickup and tonearm design then effectively "decode" the movement and convert it back to stereo audio.

This is one of the main characteristics of analogue devices - there is no real "encoding" as such of the sound data - the waveform is actually created as an exact copy, and only requires energy conversion to reproduce the original. This has several advantages in terms of fidelity, but one main drawback. The analogue representation cannot be compressed without losing information. That isn't to say you can't compress it (AM, and FW radio compresses the analogue wave considerably before transmission), but doing so produces instantly noticeable changes in the reproduced wave at the other end.

This idea for using a representation of the sound in a different medium led to the invention of digital encoding however:

Audio electronics 01100101 - Reproduction in digital form.

With the invention of computing, came the idea that you could store a representation of a sound in digital form. The basic premise takes the analogue electrical representation (which is essentially a function of amplitude over time), and samples it at a constant rate (known as the sampling rate). Each sample is then represented as a binary numerical value with polarity encoded using the two's complement system. The bit depth is therefore the precision of the numerical values representing each sample (8 bit encoding gets you 128 positive or negative values - essentially 256 unique values. 16 bit encoding gets you 32768 +ve or -ve values essentially 65536 unique values). Obviously, the higher the bit depth, the more precise the encoding available (this is referred to as the resolution) , and the higher the sample rate, the more frequencies can be represented. This relationship of sample rate to frequency is described in Nyquist's sampling theorem: The sampling rate must equal double the highest frequency component, or turned around, the highest frequency encodable in a digital representation is equal to 1/2 the sample rate. The full glory of Nyquist et al's theorem (which has a collection of names, but is most easily known as The sampling theorem) can be found here on Wikipedia if you want to treat yourself to a maths overdose.

The amount of information encodable (in terms of stereo/surround etc) is limited only by the processing speed of the equipment, a fact that is responsible for the relatively low sample rate and bit depth of standard audio CD's (when they were invented, the roughly 1.4MHz of processing power needed to process 44,100 samples per second at 16 bit in stereo was the cutting edge of computer technology. Now of course, it's a tiny amount, but the standard has been set and has been used for many years). The small amount of loss (most people can't hear it at all) in the encoding is acceptable given the convenience of the standard, and the difficulties in changing it.... As an engineer however, I would recommend using the best sample rate/bit depth combination you can manage - especially for recording anything, as some people hear loss of the above 22,050Hz information not encoded on CD's (and it is possible that other people, while they cannot hear it as such do perceive it somehow - many people describe CD's as clinical, or cold sounding, and when played audio recorded at a higher rate prefer it). Remember, due to the subjective nature of much sound perception, the commonly used constant values are only a "best average" that we use for convenience - there are people with hearing perception outside these values.... As processing power is ample and cheap these days, you have nothing to lose by using higher rates and bit depths - the only sticking point to remember is that when downsampling (going from a high sample rate and bit depth to a lower one), you need to use an anti aliasing filter as described in the sampling theorem - I am not going to go into the whys and wherefore's here (partly because I don't fully understand them myself) - the info on the Wikipedia site is accurate AFAIK, and quite comprehensive if the subject fascinates you...

The pertinent facts for our purposes are the limits described by the Nyquist rate (sample rate=2xhighest frequency encodable), and the fact that you need to anti-alias when you downsample (upsampling while possible is not ideal - once the information is lost, replacing it can only be done by some form of interpolation - a guess at best). If you start at the high rate and resolution, you lose nothing in downsampling, but (especially if you oversample), the extra information encoded can be useful when compression is applied (eg when you encode to MP3, Ogg) as lossy compression techniques particularly benefit from the extra information:

Audio electronics 0065 - Compressing by various means.

The process of digitising our audio representation also opens up the doorway to the use of compression techniques - which allow us to represent the sound in a smaller package (useful for storage, transmission electronically using the likes of the internet, and allowing large amounts of audio data to be squeezed into ever tinier devices)....

There are two methods of compressing data; lossless, and lossy. As the names suggest, lossless compression doesn't lose any of the information, whereas lossy does.

Lossless compression is difficult in audio, because the information changes rapidly, and constantly - therefore standard algorithms don't work very well. However, convolution with a [-1 1] filter tends to whiten, or flatten the spectrum slightly, thereby allowing the use of traditional techniques like Huffman Run length encoding (HRLE), Lempel-Ziv-Welch (LZW) etc. Integration at the decoder then restores the original signal. Various CODECS (COder/DECoders) use Linear Prediction Coding (LPC) techniques to estimate the spectrum of the signal. An inverse of the estimator (which is a statistical method of prediction using Maximum Likelihood Estimation (MLE) to fit real world data into a mathematical model in an optimum way) is used to whiten the spectral peaks, while the estimator is used to reconstruct the original signal at the decoder.

There are many different loseless CODECS available, and a comparison of the pros and cons can be found here.

Lossy compression techniques have been very prevalent until recent years, due to the high rate of compression they can achieve - the small packages enabled transmission over the relatively slow internet, and the fitting of lots of audio into the relatively small size storage devices of the time. They are losing popularity slowly, due to the speeding up of network transmission, and the miniturisation of comparatively large storage media (20GB of data storage for a media player enables the storage of over 2000 minutes of uncompressed CD quality audio, or over 33 hours and 20mins). Lossless compression is able to almost double this - so you get the idea (I defy anybody to need 60+ hours of audio storage for entertainment)...

Lossy techniques do have applications where they are still essential though (radio streaming over the internet for example, which needs a low bitrate to allow as many connections as possible).

Of the lossy algorithms, MP3 is the best known, with newer algorithms like Ogg Vorbis, Apple's AAC, and Sony's ATRAC being also widely used (although less well known). These lossy techniques primarily use what is known as Psychoacoustics to discard information that (hopefully) the loss of will not affect the perception of the audio. So, for example, because the human ear has a higher sensitivity to the 2-5KHz frequency band as discussed in the biology section, we can discard some of the data above and below those frequencies. Or due to what is called the "masking effect", where a loud sound masks a quiter one making it inaudible, we can lose some of the data for the quiter sound. I am not going to investigate the methods in depth, as Wikipedia again has an excellent article.

To achieve these perception based encodings, various methods are used - from transform domain methods, whereby the sound has a discrete transform applied to it (our lossless convolution technique begins with a Discrete Transform known as a Discrete Fourier Transform (DFT)). A related transform known as a Modified Discrete Cosine Transform (MDCT) is used in encoding MP3's - not on the audio signal directly, but rather on the output of a 32 band Polyphase Quadrature Filter (PQF) bank. The output of this MDCT is then used in an alias reduction technique to reduce the typical aliasing of a PQF bank at each band.

Time domain coding (like the LPC used in lossless compression) can be used with models of the sound's generator as the estimation technique to create much higher compression than our previous use of an inverse convoluted estimator for lossless compression. These models cause loss of the data because they use a set model estimation, rather than a comparison estimation (an example would be using a model of the human vocal tract to whiten the spectrum when encoding speech)...

As should (hopefully) be obvious by now, the fidelity of reproduction when using lossy compression is highly dependent on both the quantity of compression applied, and the depth of analysis applied during encoding. The quantity applied dictates how severe the dropping of "inaudible" frequencies etc will be, while the depth of analysis during encoding dictates how accurate the predictive model will be. As processing power has multiplied, the accuracy of the model has increased (which is the reason newer methods like Ogg, and AAC are better at lower bitrates than older formats like MP3 - the original design of which was created almost 20 years ago)...

So as an engineer, your preference should start with as high a bitrate representation as possible, and be lowered only by absolutely necessary conditions (ie target medium, or bandwidth restrictions etc). You can always "lose" information, but once lost, it is difficult, if not impossible to get back....

This has been a fairly whirlwind investigation of audio representation techniques which I hope has made some useful sense. Don't worry if you don't understand it all (if you do, then please correct any mistakes I have made, as I get lost on some of it - especially the maths which has never been a favourite subject of mine). The idea is to develop an overview of how it works, and to understand the pertinent facts which relate to the practical aspects we will be investigating later on.....

Build a Sound System - 1B - Audio Sources and Amplification

Who is online