Header image  
by Ralph Glasgal
 
line decor
Home Tutorials Technical Papers Kudos and Pictures Demos Bio Free Ambio book Glossary PC/Mac DIY Rec Engineers' corner Links Contact us
line decor

 

Evaluating Loudspeakers
By Robin Miller III AES, SMPTE, BSEE

timentaUsing non -sentimental sound makers in stereo and Ambiophonics

Using Non-sentimental Sound-makers in Stereo And Ambiophonics 

Budget and aesthetics aside, savvy consumers would prefer that their home loudspeakers perform similarly to professional monitors – the quality category used to base material decisions during recording, mixing, and mastering as to the “sound” of the music, movies, and games that users enjoy. Alas, even if less expensive than high-end audiophile speakers, professional monitors are often too utilitarian-looking. However, some manufacturers have equivalents designed for consumer tastes.  Whether for consumers or professional audio engineers, some means for intelligent selection is needed. 

Accordingly four practicing, professional audio engineers, including the author, conducted a unique speaker “shoot-out.”  Four pairs of monitors were assembled in an acoustically fine control room.  They ranged in price from $1,800 to $7,000 US per pair.  A high-end audiophile loudspeaker costing $25,000 per pair was auditioned separately.  The four were self-powered, fed by a stereo pair of analog signals from a routing switcher. Speakers were clustered so it was not obvious which pair was playing. However, instead of auditioning music familiar to the subjects, a special assortment of noise-makers was assembled, recorded, and made portable in a gym bag so they could be conveniently made available during subjective testing, where reproduction could be compared to the live sources.  This paper explores the theory, methodology, and surprising results. 

“Subjective Testing” by audio professionals or consumers

Typically, shoot-outs are organized informally, with participants bringing favorite source materials.  Often musical selections, movie scenes, or gaming environments are drawn from commercial releases that the subjects like, but not material they have produced or had first-hand experience in monitoring during production.  These after-the-fact evaluations are similar to consumers evaluating audio equipment in the varied (poor?) acoustics of dealers’ showrooms based on a “reference CD” brought to the audition in order to be “objective”. 

Any scientist would find fault with this methodology.  They call “subjective Testing” an oxymoron: because instead of a single variable, a single device under test (DUT), there are many.  Interaction and inconsistency reign. And emotion – sentiment – gets in the way of meaningful, repeatable results.  

What is heard during audio reproduction is the end of a complex chain of events and devices, from the original sources, recording acoustics, recording microphones and technique, recording electronics and storage format, mixing techniques including processing, mastering techniques including more processing, distribution media, users’ reproduction electronics, speakers varying in positioning, and the listening acoustics. 

If loudspeakers in a room were perfect reproducers, they would all sound alike – indistinguishably neutral, neither adding nor detracting from life-like sound.  In reality, they mostly sound very different, varying in ability to reproduce all frequencies equally, all volumes equally, and all directions equally.  Often costing half or more of a user system’s budget, they compromise these ideals, and color the sound.

Conditioning and emotionality of professionals or consumers

All of the various interplays among all the links in the audio chain are too large a subject for this paper, and have been observed and debated for decades of stereophonic tail-chasing by audio practitioners.  Unfortunately, most people have been subjected to so much questionably recorded material that they are conditioned to imperfection. We’ve become inured to over-processed, unnatural, and conflicting psycho-acoustic cues. So much so that it redefines “transparent.”  Paradoxically when we are exposed to naturally life-like sound, it is often perceived in the redefined context of “audio,” and we think it is wrong.

Seeking to avoid part of the situation, this informal study attempted to deal with perhaps the most critical hurdle – the fact that different genres of music elicit different emotional reactions in different people.  Some like jazz but not rap; heavy metal tastes clash with classical, etc.  So a kit of sound-producing instruments was assembled that avoided listener sentimentality.  Some are un-identifiable.  Their sonic signals test the limits of reproducibility. Not distracted by preconceived notions, subjects could concentrate on perceived differences while comparing live to reproduced sounds where the only the variable was the monitor speakers under test.  While not perfect, this resulted in a practical approach to more precisely evaluating the very means of monitoring that in turn leads to better decisions and artistic results in the work of the engineers involved, and by extension, to greater satisfaction among their customers. 

Taking a Non-sentimental Journey

Recording engineers have jangled keys or walked around recording venues rattling maracas with a similar goal – exciting transient and tonal sounds, but avoiding emotion in the equation. In academic studies, researchers such as at McGill University statistically analyzed responses of subjects to bells and bamboo chimes.  The writer e-mailed fifty fellow engineers around the globe, requesting nominees for the non-sentimental analogs kit that were practical and portable, or easily replicable. 

Ideal test sounds ought to represent singly, for clearer discernment, individual characteristics similar to those that will be found in vocal, instrumental, or sound effects.  They should fully exercise the capabilities of transducers and electronics in the recording chain. And they should be portable enough to be carried in a bag or shipped for use by others in their evaluation sessions, so results can be widely compared.  Since the most critical testing begins and ends, as does most recording and reproduction, with analog sound waves, the instruments chosen are all analog wave-makers. (It is acknowledged that electronic instruments are important, but in the absence of the actual instrument, we have no reference for how they sound live.) 

With these criteria in mind, along with a budget of $100 US, the writer assembled an initial kit of five “non-sentimental analogs” in a gym bag.  These were recorded in the writer’s studio using a spatial microphone technique (i.e. not a panned monaural microphone) so that an uncorrelated two-channel recording was made to avoid comb filtering during speaker-stereo reproduction. Microphones and converters (by Schoeps and Grace Designs) were of high quality.  Taken to the shootout for live comparison with speaker replay were both the 2-channel recording in CD form and the portable kit of five sounds:

  1. An orchestral triangle (having no harmonically related overtones, so that any integer-multiple overtones would be harmonic distortion produced in recording or reproduction);
  2. A bunch of about 20 shells with dried seeds emitting a cackle of transient and low- and mid-range tonal resonances;
  3. A “thunder machine” with a dangling spring to excite low-mid fundamental frequencies in the drum membrane;
  4. A German earthenware vase 24 inches high and nine inches in diameter that, rapped by a knuckle, emitted a bell-like tone, followed by a tremulous sustain;
  5. A common percussionist’s plastic egg shaker that makes a lot of high-frequency transients.

No source emitted significant energy below about 200Hz avoiding variables in the bass region which might have been introduced by subwoofer implementations and room modes.

Shoot out: The Last Monitor Standing

A sentimentally-contaminated process serves no productive purpose. For example, one might assume, even before hearing it, that the most exotic (expensive or cool-looking) apparatus would have to be best.  Or that if one now hears qualities s/he prefers to any prior hearing of a familiar recording, then the speaker is magic.  Or that rather than believing ones ears, one believes advertisements or marketing talk. So four candidate speaker pairs were obtained, set up avoiding reflections from nearby walls and console surfaces, and had their SPL outputs calibrated.  Several hours later a consensus was reached that surprised everyone:
  • Pairs #1 & 3 were most neutral when directly comparing the recorded to the live triangle – possibly the most telling source. Pair #2 added overtones that were musical enough to reveal they were harmonic distortion.  Pair #4 was off-putting in tone color, and so was eliminated from further consideration.
  • Pair #1 was thinner sounding compared to the live shells, which had tonal characteristics emphasizing 2~300Hz.  Pair #3 was nearly as full as the real thing. With the shells’ high crest-factor transients approaching full scale, pair #2 distorted badly.
  • The thunder machine, with a low-mid frequency tonal resonance, reproduced equally naturally on pairs #1, #2, & #3.
  • Like the triangle, the earthenware vase also separated the men from the boys.  Pair #3 was able to reproduce the long sustain along with the subtle but beautiful tremolo heard live in the instrument, where Pairs #1 & #2 abbreviated this sustain.
  • The egg shaker, with multiple very high frequency transients, was reproduced without exaggeration on pair #2 but was boosted in highs by pairs #1 & #3.
On the basis of the non-sentimental analogs kit live v. recorded tests, the consensus was that pair #3 was the clear winner, with pair #1 either tied or a very close second.  Surprising was that pair #3 was the least expensive and pair #1 the most expensive by a factor of 3½.  Pair #2 came in a clear 3rd despite a price 20% less than pair #1.  Pair #4 failed to make the cut.

Music, Music, Music

After the relative sterility of the analogs, it was decided after all to listen to music to see if any correlation could be discerned.  Acoustic music included orchestral, big band, and voice & piano trio.  The first two, produced by the writer, were pronounced by the others as “natural-sounding” on pair #3, a bit too bright on pair #1, and “boxy” on pair #2, with cabinet resonances reacting to timpani.  The voice & piano trio was a commercial recording that, although no one present had experienced its production, was preferred on pair #1 despite emphasizing high frequency harshness on sibilants – probably artifacts of compression.

This last point is the dilemma of circular logic that affects strictly subjective shootouts, where material familiar to subjects only through hearing in various replay situations rather than live,  takes on a new preference.  When choices are being made during production or in post, loudspeakers proving to reproduce sounds most like the real signals should be preferred as the reference.  Make the sound whatever you like using these monitors, confident there will be fewer surprises later. Finding a sound that is preferable after-the-fact for one isolated replay situation is irrelevant or even wrong for every other. 

Discussion & Conclusions

Using “non-sentimental analogs” as test signals, the group was able to reach consensus about speaker selection: hype, whether suggested by high-end pricing or sales talk, is no substitute for rigorously selecting a well-engineered monitor loudspeaker.  Is it a coincidence that the top-rated monitor is the only one of the four evaluated whose manufacturer publishes measured performance data and curves for this and many of its other products, and has earned a reputation for honest sound? Such information is meaningful, especially regarding off-axis dispersion response, which tells more about how the speaker will couple to the room, with which it forms a system.  If speaker selection is not rigorous, mixing decisions and results will be tainted, adding to acoustic changes in timbre, imaging, envelopment, and speech intelligibility due to room reflections. 

All four participants agreed that evaluating the non-sentimental analogs first affected how they listened to and evaluated the music that followed. While they might have succumbed to an emotional bias had they listened only to their “reference music,” the non-sentimental sources set the stage for more informed listening-based evaluation.  If knowledge is power, perhaps more finely-honed perception is too. 

The transducers in the recording/reproduction chain are the most challenged components.  Unlike electronics’ relatively easy tasks, microphones and loudspeakers need to change one form of energy into another. The closer to ideal a microphone is, the more likely it will sound like another close-to-ideal microphone, neither contributing artifacts of their own.  The same should be true of loudspeakers, however the dimensions, complexity, and power levels involved imply greater, clearly audible colorations. 

Real microphones were involved in recording the analogs used to evaluate the loudspeakers: did something about the microphones favor one speaker over another? Microphones, preamplifiers, and A-D converters were a necessity in the process of course, but most would accept that the equipment chosen (Schoeps microphones, Grace Designs preamp/AD, no EQ or compression) would not bias the results.

Even if surprising, the writer, until he knows a better method, is confident of these findings using non-sentimental analogs. Coincidently, the orchestral recordings used were recorded with the same microphones used to record the analogs, and mixed on the same speakers evaluated as pair #3, validating that no compression or equalization was needed – choices which were confirmed when the recordings were reproduced only slightly brighter in the close-second placing pair #1. 

The surprising result is that the least expensive DUT came out on par with the most expensive using this approach.  This seeming paradox may be due to the DSP correction inherent in #3’s design, whereby driver nonlinearities are corrected by inverse digital filters, rather than by more expensive physical solutions. 

APPENDIX: Units Evaluated

Microphone – Schoeps CCM3 (2), CCM8 (2), 18cm sphere 

Preamp/Converter – Grace Designs M802 

Pair #1 – Barefoot MM27 

Pair #2 – Focale Twin6 BE 

Pair #3 – JBL LSR4328P* 

Pair #4 – Quested S-7 

Pair #5 – SoundLab Majestic (see below). 

* Note: While this unit’s automatic Room Mode Correction system was utilized during evaluation, the sources had insignificant energy below 200Hz where these corrections might have had effect. 

 

Pair #5 above are SoundLab Majestics, full-range electrostatic speakers (bi-directional) in the foreground, pictured at the Ambiophonics Institute, that compared very favorably to all five live sources in the experiment. 

SIDEBAR: Speaker-binaural Ambiophonics

In a separate session using high-end full-range electrostatic speakers (pair #5), the writer compared the non-sentimental analogs kit, live v. recorded, using Ambiophonic principles and crosstalk cancellation.  No matter what the price of the speaker, successful crosstalk cancellation requires good channel balance, and consistency between a pair of loudspeakers, especially of phase and frequency response through the crossover region.  (Of course full-range electrostatics have no crossover issues.)

The microphone technique used in recording the analog sound makers is compatible with loudspeaker binaural (Ambiophonic) reproduction.  In fact, for critical listening by one or two listeners sitting along the median line between Ambiophonic’s closely-spaced speakers, many if not most legacy stereo recordings, whether from LP or CD sources, benefit from Ambiophonics. 

Ambiophonics speakers FL & FR in front convey a stage equaling the original recording angle extending to virtual FL & FR, uncolored center images, and more listener envelopment (LEV), reaching the maximum regions shown, all in contrast to conventional stereo speakers L & R.

Briefly, replay over conventionally placed stereo speakers, positioned in an equilateral stereo triangle with the listener, limits auditory imagery, including ambience originating around and above the listener, to the frontal 60° span of the speakers.  Lost are those details, captured in spatially rich recordings containing binaural cues with both inter-aural time (ITD) and level differences important in natural hearing.  In contrast to spatial recording techniques that capture ITD, many disc producers typically pan monaural spot mics or direct sound sources between channels to create phantom images with only inter-aural level difference (ILD).  ILD produces correlated speaker signals including unintended, delayed arrivals at the faar ears because of acoustic crosstalk, inherent in conventional stereo speaker placement.  The resulting comb filtering distorts pinna cues and reduces both ILD and ITD.

ITD is important for headphone listening, but also for speaker-binaural, enabled by crosstalk cancellation, such as used in Ambiophonics.  By moving the speakers close together in front and using crosstalk cancellation, Ambiophonics creates images up to 180° wide and avoids comb filtering and pinna confusion for central voices that, in conventional stereo, come in fact from speakers at the sides. The disadvantages of Ambiophonics are the need to listen on a “sweet line” median to the speakers, and errors due to mismatched speaker levels, or listening off-axis.  Hence Ambiophonics is for family or small group listening or when seated in a fixed position such as in a car, at a workstation or at a gaming console.

In Filmaker Technology’s Listening Lab C, the analogs kit recordings produced images perceived up to 150° wide using pair #3 except closely spaced in front and using the Recursive Ambiophonic Crosstalk Elimination (RACE) algorithm.  The recording angle had been 180°, producing the maximum 640µs interaural time difference (ITD) inherent in the ear-spaced microphone used, along with head-shadow-like ILD.

At the Ambiophonics Institute outside New York City, full-range electrostatic speakers (pair #5) also reproduced images over a perceived angle of 150°.  Unlike what might result using hybrid cone-electrostatic speakers, the full-range electrostatics compared very favorably to the live sources. Recordings made using the author’s PanAmbiophone also reproduced images of both instruments on stage and important early side wall reflections up to 150° in width.  Not possible to auralize these images with conventional 60° speaker stereo, this extra width allows legacy stereo recordings to approach the natural spatiality – maximizing listener envelopment (LEV, see illustrations) – whereby enveloping reflections are perceived at wider angles than staged instruments, as they would be hearing them live.  Yet it requires only two channels and two speakers without surround speaker layouts and multi-channel recordings.  (Ambiophonics also supports increasing envelopment by adding hall response convolution signals fed to surround speakers.)

PanAmbio surround adds back speakers BL & BR, imaging as BL & BR where listener envelopment (LEV) is maximum.  (Note: Play 5.1 in PanAmbio by setting the player to “no center” to mix the C channel to the front speaker pair.)

5.1-compatible surround sound for movies, gaming, or multi-channel music is accomplished by adding a second Ambio pair in back – termed PanAmbio (see illustration above).