Chapter 5: Implementation

The constant power and optimal panning algorithms were implemented in software as a single DirectX audio plug-in for Windows 95. Software implementation had the advantages of available development tools, existing skills with those tools, and an increased likelihood of being used as an actual product. An object-oriented design process was used to facilitate the attainment of all design goals: reusability, correctness, "changeability," completeness, efficiency, and simplicity. C++ was chosen as the object-oriented programming language and Visual C++ 5.0 was chosen as the development environment.

An audio plug-in is a piece of software that tells the CPU or sound card how to process digital audio [78]. To perform their tasks, audio plug-ins require a host software application to "plug into." Host programs may offer digital audio recording and editing capabilities and many add MIDI sequencing and music notation features. Users can process an audio file with a plug-in just as if the plug-in’s features were part of the host program.

Several audio plug-in formats exist for the Macintosh and Windows operating systems. On the Windows platform, Microsoft’s DirectX format has become the standard that allows developers to create software for any compatible host [79]. While other proprietary Windows formats exist, the DirectX format has the advantage of a fairly robust object-oriented architecture that meets the current needs of the industry. Additionally, DirectX plug-ins offer Microsoft’s promise of future extensibility, inclusion in Win98 and WinNT 5.0, and hardware acceleration. Fifteen DirectX plug-ins were available as of August 1997 [78] and several more have debuted since. Several host programs are currently supporting this format, including Cakewalk Pro Audio, Sonic Foundry’s Sound Forge, Steinberg’s WaveLab, and Steinberg’s Cubase VST.

DirectX is a hierarchy of application programming interfaces (APIs) that facilitate communication within the Windows operating system [80] [81]. The DirectX foundation layer sits at the bottom and provides an encapsulated method for accessing hardware such as a sound card. Developers can program for the DirectX foundation without worrying about software drivers for the specific set of hardware on a user’s computer. Above the DirectX foundation is the media layer, called DirectX Media. DirectShow, part of the media layer and formerly called ActiveMovie, provides capture, processing, and playback of multimedia streams from local or remote sources. The "DirectX" audio plug-in is actually a DirectShow component. At the highest level are applications such as the host program that take advantage of DirectX Media and/or DirectX foundation layer services.

DirectShow replaces much of the functionality of the Media Control Interface (MCI), the original Windows multimedia API [78]. Unlike MCI, DirectShow provides a consistent, object-oriented interface that is multithreaded and not limited to a 16-bit implementation. Every object in DirectShow is based on the Component Object Model (COM), a language-independent API that ensures this consistent interface to Windows software developers.

Central to DirectShow is a system of modular components called filters that are arranged in a configuration called a filter graph. (Note that DirectShow’s filters are defined differently than those in signal processing theory.) Another DirectShow component, the filter graph manager, controls the connection of these filters and the flow of multimedia data "through" them. Each filter has member objects called pins that allow specific types of data to flow. Because these pins are directional and strongly typed, filters may not be randomly connected to each other in filter graph [82]. Audio data doesn’t actually flow through the pins as with a piece of hardware. The pins merely provide interfaces to the filter.

Three types of filters are available: source filters, transform filters, and renderer filters. Source filters represent multimedia files or capture devices. These filters have one output pin and no input pins. Transform filters have input and output pins and are used to process the data stream. Renderer filters "play" the stream in a manner depending on its type. For instance, audio files are played through the sound card’s outputs and video files are displayed in a window on the computer monitor. Renderer filters have one input pin and no output pins.

The most basic filter graph would have one source filter and one renderer filter. This configuration could be used for playing an audio file without any processing. A typical filter graph would have a source filter, transform filter, and renderer filter in series. Figure 5.1 shows such a typical filter graph. The transform filter may be used to perform any desired signal processing on the stream before it is rendered. Very complex filter graphs are possible with many transform filters in series and parallel.

We can see that a DirectX audio plug-in is in fact a DirectShow transform filter or encapsulated group of transform filters. The plug-in receives audio data from its input pin(s), performs any signal processing tasks, and delivers the processed data to its output pin(s). Host programs typically provide the filter graph manager, source filter, and renderer filter. DirectX audio plug-ins are made accessible to the host programs as dynamically linked libraries (DLLs) with the .AX extension.

Sonic Foundry, developer of the host program Sound Forge, provides a Plug-In Development Kit (PIDK) as a supplement to Microsoft’s DirectX Media Software Development Kit (SDK) [80]. The documentation and samples in Sonic Foundry’s PIDK were designed to expand on the DirectX Media SDK in the area of digital audio signal processing [83]. As of this writing, the current PIDK was version 4.0 (July 1997) and the current DirectX Media SDK was version 5.1 (Nov 1997). Both development kits were used in the implementation of the pan pot plug-in.

Certain plug-in features currently are constrained by the limitations of the PIDK, and these stem from limitations of Sound Forge 4.0c relating to plug-ins. Source and renderer plug-ins are not supported, but this is not applicable to our application. Only 8-bit PCM, 16-bit PCM, and 32-bit float connections between pins are supported. While 16-bit PCM connections were deemed sufficient for the pan pot plug-in, code was written for both 16-bit and 32-bit float usage. Finally, and most relevant to our plug-in, the input and output pins of a plug-in are constrained to the four configurations listed in Table 5.1.

Description	Number of Input Pins	Number of Output Pins
Mono in, Mono out	1	1
Mono in, Stereo out	1	2
Stereo in, Mono out	2	1
Stereo in, Stereo out	2	2

Obviously the number of input and output pins correspond to the number of input and output channels in our surround sound pan pot. This puts a severe constraint on our desire for a pan pot with a single, monophonic input and five monophonic outputs. The only choice was to implement the pan pot as a mono in, mono out device with an internal switch to route one of five virtual outputs to the single output pin. Figure 5.2 shows this a conceptual block diagram of this configuration. While using a host program such as Sound Forge, the user would select an output channel, process the input audio stream, save the output file, and repeat the process for each of the output channels.

Since this project’s purpose was to investigate different panning methods and not to develop a commercially available product, only a few basic features were included in the pan pot plug-in. Potential features are mentioned in Chapter 6 and should be considered before delivering this product to the marketplace. Note that a formal specification language was not used for this development project.

In our simple pan pot, the user may select (1) the panning algorithm (constant power or five-channel optimal), (2) the output channel, (3) one of three preset speaker arrangements (see Table 5.2), and (4) the panning angle. The user interface controls for the first three of these features were chosen to be radio buttons. The most readily available control for the actual pan pot was a horizontal, linear slider. When the slider is centered either manually or with the centering button provided, the panning angle is set to q = 0° . As the slider is moved to the left of center, the panning angle changes counter-clockwise until it reaches q = 179° at the far left. As the slider is moved to the right of center, the panning angle changes clockwise until it reaches q = 180 ° at the far right. All panning angle changes are in 1° increments. Figure 5.3 shows all the features/controls of the pan pot plug-in.

Loudspeaker	Ideal / Equiangular	"Typical"	Dolby Pro Logic/ Dolby Digital Recommended
Front Right (FR)	288°	330°	337.5°
Center (C)	0°	0°	0°
Front Left (FL)	72°	30°	22.5°
Surround Left (SL)	144°	120°	90°
Surround Right (SR)	216°	240°	270°

COM is a specification for an architecture for connecting and manipulating components through their interfaces [80]. A COM interface is a set of functions implemented by a component (such as a plug-in) and used by another component called the client. Interfaces hide implementation details of the component and allow a client to treat different components with the same interface similarly. These characteristics are known as encapsulation and polymorphism, respectively. All COM components implement the IUnknown interface and most have additional interfaces.

With these concepts in mind, we may give an overview of the plug-in’s object-oriented design. The plug-in’s main class, CSfPlugIn, is provided in the PIDK and easily modified for use in any plug-in. CSfPlugIn inherits from and overrides some functionality of the CTransInPlaceFilter and CAudioTransform base classes. CTransInPlaceFilter is an abstract base class that provides support for a simple transform filter with a single input and a single output. It is provided in the DirectShow SDK and is derived from CTransformFilter, which provides the IUnknown interface, the IFilter interface, the IMediaFilter interface, and two pins. The input and output pins, derived from CBaseInputPin and CBaseOutputPin, expose the IPin interface. Thus most of CSfPlugIn’s DirectShow functionality comes from its inheritance of CTransInPlaceFilter.

CAudioTransform is provided in the PIDK and its Transform16bit( ) or TransformFloat( ) member functions are overridden to do the actual "transform" of the audio data. In this plug-in, these functions call the function GetPanMultipliers( ) for computation of the channel gains. GetPanMultipliers( ) takes the panning type, output speaker channel, speaker azimuths, and panning angle as arguments and calculates the five channel gains based on the constant power or optimal panning algorithm. For the optimal panning algorithm, it calls epsilon( ), which computes e _a or e _b as in Eq. 3.35 and 3.36. GetPanMultipliers( ) returns a structure of all panning gains as doubles, and Transform16bit( ) or TransformFloat( ) take the gain for the currently selected channel and multiply it by the audio samples in the current input buffer. The C++ code segments for GetPanMultipliers( ) and epsilon( ) are shown here for reference. Note again that the code for the optimal algorithm may fall under patent or copyright protection by Gerzon.

An instance of the CSfPlugInPropPage class forms the plug-in’s property page, a special type of window that includes all controls accessible to the user. When the user touches a control on the property page, a member variable of CAudioTransform changes value. This process, described below and in [84], is somewhat convoluted because the property page and processing code usually are running in different threads.

Using the plug-in is a straightforward process. It is assumed that a monophonic, 16-bit, 44.1 kHz digital audio recording already exists in WAV file format. The recording engineer first executes a host application such as Sound Forge and opens the audio file of interest. He or she selects the audio file (or a section thereof) and executes the pan pot plug-in to process the selection. (In Sound Forge, plug-ins appropriately are listed under the DirectX menu.)

When the plug-in window appears, the engineer may choose the type of panning, the speaker channel, and the speaker placement configuration as described above. The engineer then selects the desired panning angle and clicks the OK button to pan the audio selection for the selected channel. At this point, the window disappears and the engineer is able to save the panned selection under a different file name (E.g., hurdy-gurdy_solo_SL.wav). To complete panning of the original file, the engineer repeats the process for each of the five speaker channels. The resulting five audio files then may be auditioned over a 5.1 surround sound system (with no information on the .1 channel).