Chapter 1

Introduction

Summary

New multichannel audio formats recently have been standardized thus technologies originally developed for stereo must be redesigned. In this thesis, five-channel pan pot techniques employing interaural intensity difference cues are compared analytically and experimentally. Panning methods are compared analytically using quality criteria from the literature and experimentally using controlled listening tests.

Problem Definition

Context

Stereophonic reproduction has been the dominant method of playing recorded music for several decades. People have been listening to stereo LPs since 1958, stereo FM radio since 1961, and stereo TV since 1986 [1]. Unlike monophonic reproduction, stereo enables listeners to localize sounds in the recording and experience a sense of spaciousness (also called listener envelopment or ambiance) [2]. Since stereo has always been preferred to mono, it was reasonable that multichannel systems with four or more channels should perform even better. These systems are variously called surround sound systems or multichannel formats. While much interest was generated by four-channel surround sound systems in the early 1970’s, the lack of one unified standard format prevented these "quadraphonic" systems from achieving market acceptance.

Multichannel audio formats that have succeeded were originally created to fulfill the needs of the film industry. The first such format was the three-channel, five speaker "Fantasound" format used for performances of Disney’s Fantasia [3]. Years later, Dolby Laboratories developed several popular multichannel soundtrack formats for commercial cinema applications [4]. The professional Dolby Stereo format, introduced in 1976, borrowed matrixing techniques from quadraphonic theory to encode four channels of audio into two channels optically recorded on motion picture film. Left (L), right (R), band-limited surround (S), and center (C) channels were available in Dolby Stereo, with the center channel finding its main use with dialogue. Unlike the audio industry, the film industry has always understood "stereo" to mean two or more channels [1]. The four-channel Dolby SR optical format, released in 1987, improved on Dolby Stereo with a wider dynamic range and frequency response.

The Dolby Digital format was introduced in 1992. This "5.1" multichannel format allows up to five full bandwidth channels (20 - 20 kHz) and one low frequency effects channel (20 - 120 Hz) to be encoded optically on film. The five full bandwidth channels in this digital format are left, right, center, left surround (SL), and right surround (SR). Dolby Digital uses a lossy data compression scheme to achieve a low bit rate. Quantization noise is allowed to rise with the expectation that it will be psychoacoustically masked by the audio signal. A 5.1 channel format similar to and competing with Dolby Digital is the DTS Digital Surround format from DTS Technology.

As consumers grew accustomed to high quality surround sound in movie theaters, the video revolution came to the living room. The stereo videocassette was introduced in 1978, followed by the higher quality, stereo laser disc in 1980 [1]. These consumer formats created a huge demand for movie content. Dolby Surround (L, R, S) and later Dolby Surround Pro Logic (L, R, C, S) were developed so that Dolby’s four-channel movie soundtracks could be experienced using these stereo media. Movies with Dolby Digital (formerly known as AC-3) soundtracks first came to the home via laser discs in 1995. Dolby Digital since has been chosen as the multichannel audio format for DVD video and American digital television (DTV). DTS has entered the home on special compact discs (requiring special decoders) and may be included in video DVDs. As a result of these and other technologies, the once separate worlds of hi-fi audio and television are merging into the "home theater." An excellent overview of this new phase in surround sound is given by Steinke [5].

Concepts

The panoramic potentiometer, or pan pot, has been an integral part of audio recording equipment since the beginning of stereo. (The terms "pan-pot" and "panpot" also are used in the literature but "pan pot" seems a better abbreviation.) The rotary potentiometer typically used in this application is simply a knob that varies the electrical resistance between one input terminal and each of two output terminals. A monophonic audio signal is applied to the input terminal and the output terminals are connected to the left and right stereo channels. By turning a pan pot, the engineer is able to place the phantom image of a monophonic sound source in the "panorama" or stereo field between two speakers.

The simplest quality criterion for a pan pot is that the spatial location of a sound source desired by the recording engineer corresponds to what is heard in the reproduction system with a high degree of accuracy. Pan pots are necessary for imparting directivity onto direct instrument signals that include no inherent spatial information. These include (1) acoustic instruments recorded with a single, close microphone and (2) outputs of electronic instruments. Phantom images of recorded acoustic sources also may be placed in the stereo field using one of several techniques involving two or more microphones. Pan pots then are applied to each of the microphone signals depending on the microphone technique and the application.

Elmar and Leal considered several artistic uses of the pan pot [6]. Pan pots may be used to (1) imitate the locations of different musicians on stage, (2) increase ("enhance") the distances between elements of multisource instruments such as drum sets, (3) make several instruments seem to originate from all directions between the speakers, as in Phil Spector’s "wall of sound," (4) distribute sound sources in an artistic way, such as spreading instruments for roughly even loudness everywhere between the speakers, (5) simulate the existence of a room in which the musicians are playing with the help of artificial reverberation, (6) adding directivity to monophonic and stereo direct instrument signals. They note that the stereo outputs on some electronic instruments do not guarantee specific spatial directivity because their goal seems to be spaciousness rather than localization.

Rationale

New recording equipment is necessary for the creation of audio content in the 5.1 multichannel formats. Five-channel pan pots must be developed that are capable of positioning a sound anywhere between five speakers positioned around the listener. Like all creative tools, these pan pots will invite new recording techniques and philosophies. A few dedicated multichannel pan pots are already available in hardware from OmniSound Corporation [7] and TMH Corporation [8], and in software from Kelly Industries [9] and Dolby Laboratories [10], among others. While it is possible to design multichannel pan pots by applying traditional stereo panning techniques, alternate panning methods should be investigated for possible increased benefit. Potential benefits include more precise localization and uniform image movement while rotating the pan pot.

The goal of this project is two-fold. First, several panning methods are analyzed using all available optimization techniques. In the past, pan pot designers have devised a new analytical method of optimization and developed a panning algorithm for that optimization. While other panning algorithms may have been analyzed using the new optimization method (and found not to match it), rarely were other optimization methods considered. Second, all algorithms in this project are tested experimentally through controlled listening tests. Listening tests of panning algorithms were rarely described in the literature so each algorithm’s subjective performance still needed to be assessed.

Scope

Multichannel panning methods are considered that control the azimuth of a monophonic sound source when reproduced over five loudspeakers (L, R, C, SL, SR) arrayed horizontally around the listener. The sixth "0.1" channel is ignored in this study because the low frequency content on it is not easily localized nor is it meant to be. Headphone-based 3-D audio systems or virtual surround systems using only two speakers are not considered. The target implementation for each panning algorithm is assumed to be software or a DSP chip, not analog circuitry.

We are concerned with reproduction of horizontal localization cues only in a domestic setting rather than in a movie theater. The most obvious difference between these two environments is scale. In a domestic application, the room is much smaller, the speakers are necessarily closer to the listener, and the entire listening area may be only about 10 ms wide versus as much as 40 ms in a movie theater [11].

The surround sound format for which the pan pot is applied shall be a discrete system in which there are an equal number of transmission and reproduction channels. This constraint eliminates systems employing kernel methods (i.e., Ambisonics) in which special decoders are necessary that inherently affect spatial information. A further assumption that all five channels carry full bandwidth information eliminates matrix surround sound systems (i.e., Dolby Surround Pro Logic) in which any broad bandwidth sound localization cues would not be available.

Finally, we make assumptions about the five loudspeakers themselves. We assume that they radiate primarily in one direction (i.e., monopoles), are matched in frequency response and sensitivity, and are aimed inwardly at a single point in the room. All speakers are assumed to have the same power available to them from the receiver. Dipole surround speakers, recommended by THX for home theaters, are meant to (re)produce ambience and specifically not allow easy localization of sounds to the sides and rear of the listener. Sounds panned outside the L, C, and R speakers using any panning method are not easily localized if dipole surround speakers are used.

Overview of Contents

Chapter 1 (Introduction) introduces background information. Chapter 2 (Spatial Hearing) describes several spatial hearing theories after introductions of necessary concepts from physics and psychology. These theories of spatial hearing must be understood before different panning methods may be explored. The last section of this chapter describes how localization of sounds affects their perceptual grouping into different "streams."

Chapter 3 (IID-based Panning Methods) begins with the rationale for studying panning methods based on interaural intensity differences within the scope of this project. Criteria for evaluating pan pots are described so that different panning methods may be compared. Finally, panning methods are described that optimize for constant gain, constant power, velocity and energy vector equality, and observance of the azimuthal sampling theorem. A fifth hybrid algorithm is developed as a compromise between the constant power and velocity and energy vector optimizations.

In Chapter 4 (Listening Tests), an experiment is designed to test the algorithms for compliance with most of the quality criteria. The procedure is described in which listeners are asked to localize phantom images panned using each of four IID panning methods (not including constant gain). The results are analyzed statistically and the panning method(s) that best meet the tested quality criteria are determined.

Chapter 5 (Implementation) describes a simple software implementation of the best panning method(s). The multichannel pan pot is designed as a DirectX audio plug-in for Windows 95 and Windows NT. DirectX is described, and features and limitations for the necessary developers’ tools are given. The plug-in is designed to meet a simple set of desired features using an object-oriented design process.

In Chapter 6 (Conclusions), the project is summarized and its findings are interpreted. Recommendations are made for further work in this area, including ideas for new panning algorithms, better experimental design, implementation optimizations, and suggested features for commercial multichannel pan pots.

(Previous Chapter) <- Main Page -> (Next Chapter)

Jim West, University of Miami, Copyright 1998