Using the power of audio to build immersive and transportive soundscapes for your podcast


. 7 min read


This is Part One of a two-part series. In this first part, Sound Designer Mark Angly sets the stage for building an immersive soundscape — determining when to use sound effects, how to find the right ambience, and building an effective atmospheric layer.

In Part One, Mark begins by asking an elemental question for podcast sound designers.

When is the right time for sound effects?

In sound design, different formats necessitate different decisions about sonic content and treatment. A movie soundtrack has different objectives than a song or a live radio program. Podcasts are still a fairly new medium, so what “podcast sound” even means is still evolving.

When evaluating the right time for sound effects, I might consider: what kind of show are we making? What kind of sound reinforcement does it need? In some cases, the question is as specific as, “what kind of music could introduce this interview?” And in other cases where the content speaks for itself, little other embellishment is needed or welcome at all.

But often, audio storytelling can be greatly enhanced if the audience can hear the sound of the action alongside its description. When the listener is invited to put themselves into another time and place, additional sound effects and ambience can be extremely immersive.

Where to start?

First, I figure out what Sound Effects (SFX) I need to add and why.

What am I trying to accomplish?

What’s the emotional tone of the piece I’m working on?

Then I start thinking about structure and how to introduce sound: eg. when a guest on the podcast starts recounting their story, I need to decide if it is more appropriate to slowly introduce some supporting sounds underneath their narration, or to establish the sound first, giving the listener time to get acquainted with the space they’re in, and then roll the tape.

What choice do I make?

The answer is, “It depends.” The point is to be intentional.

I like to play the raw dialogue or read the transcript and ask myself, “What do I want to hear? What’s missing?”

So now I’ve (hopefully) identified why I need some SFX and a rough outline of the structure. The next logical step is to open up my SFX management software of choice (Soundly is a good free option with an additional subscription-based “pro” SFX library) and get to finding sounds.

Next, go to the ‘library’ and get wordy

Say my podcast host is interviewing someone in a city park, and we’d like to add some additional “park ambience.” Let’s fire up the search and oh noooo…

1,336 is a lot of results to sift through, and that’s a small number compared to some search terms. So one skill you’ll need to refine is learning good keywords! The Universal Category System initiative is trying to standardize these keywords throughout the SFX recording & editing industry, so it may be worth reading up on if you’re interested. Entering “ambience park day birds” yields a much more reasonable 29 results, so I can start auditioning sounds.

What makes one sound better or more useful than another?

Now we are ready to start adding sounds!

But what am I listening for? What makes one sound better or more useful than another?

I filter sounds through a hierarchy of attributes in order to narrow the pool:

1. Emotion. How does the sound make me feel? Does that feeling complement the tone of the piece?

2. Variety. How much do I have to work with? If you are creating ambience, the sample will need to be long enough to still be usable after cutting out any unwanted noise. If you will be repeating elements, is there enough variety to use the sample without it sounding “looped?” This is especially important for things like foley footsteps.

3. Recording Quality. Hopefully, your podcast host is recorded in a decent room on a decent microphone. If I’m complementing their recording with SFX that sound obviously different in character and quality (eg. a 96k mp3 conversion of a tape transfer of a vinyl print of a door slam recorded 60 years ago), the SFX will stick out like a sore thumb.

4. Technical Accuracy. Is the bird sound I’m adding in the background actually native to the region being constructed? Better yet, is it accurate for the time of year? Note that this kind of accuracy is very important to some people, but can also be completely ignored for creative purposes. Again it’s less about following the rules and more about choosing which ones to consciously break.

To this point, not all SFX libraries are created equally. The more expensive and expansive libraries have been curated by professionals to ensure that their sounds represent these technical qualities. If you’re not working with effects very often, it may not be financially viable to spend hundreds or thousands of dollars on a sound library. But if you are, they can save a lot of time and heartache.

Building an immersive environment is like layering cake

If we think of a scene as a cake, the Environmental/Atmospheric sound elements (commonly ‘atmos’ or backgrounds/BGs) are the base layer. Recording quality is generally important for atmos because their function is to transport the listener to another place, and people’s ears and brains are very good at hearing what’s around them. Binaural/ambisonic recordings do a particularly good job of this, but can also be distracting if over-used, or used absent of any other binaural treatment. Basically, a well-recorded ambience can be the difference between kinda okay immersion and “wait, was that actually thunder outside?”

With that in mind, I always search for contrasting but complementary layers of atmos. You may know about Gestalt Theory as it applies to visual elements — the idea that the mind fills in the gaps when perceiving patterns — but it also applies to sound. Using several layered tracks gives the sensation of greater depth and “realism,” and also means I can emphasize different elements of the environment as time goes on, as the story unfolds. Busier (and therefore more distracting) BG elements can fade down as the narrative progresses and we focus more on the words being said. In some cases, you can fade different layers up and down to imply physical movement.

Some examples of contrasting but complementary qualities:

  • High Frequency/Low Frequency — eg. cityscape [01 and 02]: The wide, deep city rumble complements the streetside recording and gives a greater sense of depth and scale.

[01 City Scape]

[02 City Scape]
  • Busy/Sparse — eg. birds [03 and 04]: The birds on [03] are pleasant, but again, too busy to play under most narration. They can fade down or out completely while the forest atmos [04] continues underneath.

[03 Birds]

[04 Birds]
  • Dynamic/Static — eg. harbor [05 and 06]: The marina ambience is a good general background, but lacks detail — the waves [06] breaking on the dock can be edited to punctuate the start of the scene, or a specific moment in a story, and then ducked down so as not to overwhelm the story.

[05 Harbor]

[06 Harbor]

A final note about searching for sounds — sometimes, a podcast script might call for something that either makes little (or no) sound, makes a bad sound, or just a thing for which no recordings exist (and it’s not practical to record it myself). In those cases, you can either argue that extra sound effects aren’t needed or make something up from constituent parts. For example, take this story about Francis Curtis, and his prototype steam-powered car.

As best I can tell, steam cars make don’t make a lot of sound in actuality. There’s also no way to find out what a steam car from 1866 sounds like. But since the tone of the show is a little playful overall, I took some license and made the sound of the car from these layers to help give a little life to the story:

Layer 1

Layer 2

Layer 3

Add in some more steam, raise and lower the pitch, and you’re off to the races.

Combined Layers

Placing SFX / Pacing and Timing

As with comedy, effective SFX in podcasts are all about timing. Usually, timing is dictated by the Voice Over (VO), as well as whether the sound in question is a background or a foreground element, and whether it’s supporting the words or interjecting between/on top of them. This is again very situational, but it helps to look at motivation.

Is the sound a realistic ambience:

Listen to the ambient sound

Is it attention-grabbing?

Or is it weaving in and out to help tell the story?

In thinking about the timing, the “right place” for a sound always depends on what’s around it. Everything is relative! If nothing else, I at least try to be intentional with timing.

Everything I’ve said so far and the examples I’ve shown represent one approach and one philosophy of many. We all bring our own experience to the table when we sit down to create sounds, and I’m extremely grateful to be working with such a diverse and talented group of sound designers at Pacific Content; every time I hear their work, I find a new approach or a new way to problem solve. Even in collecting these examples, I was reminded of how my process has evolved over the past few years.

So when you’re sitting down to build your own soundscapes, the intention is crucial. Any sound you add should be enhancing the story, or at least provide the setup for something else that does. Think about how the different layers complement each other, how they interact with the vocal and musical elements around them, and how the whole thing flows from scene to scene, from section to section. And while you’re at it, try and have some fun!

