GODOT DEVLOG #2: Trying to Listen

recording, playing back, waveforms, forum posts, plonk mastery

February 6: Recording and Playback

After creating a nice little foundation (character controller, tree, bird, sounds, interactions, physics) last week, I wanted to expand on the player systems. This is part of the prototyping stage: I need to see if my ideas for the systems work on a small scale before I expand them to a larger scale and flesh out the details.

One framework for understanding gameplay and game design is to think of the actions of the player as constituting a game loop. What a fun pair of words, can we just appreciate that for a moment? Game loop. Sounds like something I want to be stuck in. This concept is pretty self-explanatory I think: the game loop describes the interlocking sequence of actions taken by the player, and the systems that draw the player through those actions. Get a quest, leave town, slay monsters, return, level-up and gear-up, and repeat with variation. Or: load in, find loot, avoid the storm, find enemies, kill or die, repeat. A well-balanced loop carefully guides the player along with incentives and rewards, gently pulling them through the plot or some form of linear progression. A fucked-up game loop leaves players wondering what to do next, or why they’re doing what they’re doing.

A major part of the loop of this game is going to be recording sounds in the environment and storing those sounds for playback later. So my first task was to find out how to record sounds, nay!, how audio works in Godot in the first place. I read through the documentation, where I found a tutorial and example Godot project file detailing how to access the user’s microphone, record from it, and play it back in-game.

I replicated the tutorial’s code in my own project, but I ran into an issue for about an hour where nothing seemed to work. For the life of me, I could not get any signal from my microphone in my game, even though the tutorial project worked just fine. Through luck, I found an “Allow Audio Input” option in Godot’s preferences. None of the documentation nor posts mentioned this toggle, and I almost gave up right there! It just goes to show: don’t give up! The solution might be two clicks away hidden in a menu somewhere.

But I didn’t want to record the microphone for my game. I wanted to record in-game sounds. I kept digging for clues and solutions and found a reddit post where someone figured this out. Thank you, user plonkmaster_jones. I can see why they call you the plonk master.

Eventually, I was able to capture in-game audio recordings by hitting R and play them back by hitting T. I even implemented a little colored square in the corner to indicate the recording status: red for recording, blue for idle, green for playing back. So now I can shut off that pesky speaker and get a clear recording of the bird to listen to as I fall for eternity, as you can see in this video:

Great! Can’t believe that only took me a day to figure out, I was really thinking that part would be tricky. This code will probably change as I streamline and systemitize things in the future, but here’s the recording code I wrote:

February 7: Waveform Worries

So now we can record things. The recording system is ugly and held together with bad code and digital duct tape but it works. Onto the next part of the game loop: the player will have their recording scored, or measured for accuracy. I want to build a system that can check whether the player’s recording of the bird is clear and accurate. How can I do this?

After spending two days on this question, I realize it’s quite a doozy. To actually compare the contents of two audio recordings and check the fidelity of one to the other would take some really sophisticated software or code. But I wouldn’t have progressed this far into the project if I didn’t have a few ideas for solutions.

My initial idea was this: find some way to transform the audio of the recording into a visual waveform (like the kind you see on Soundcloud, for example). Then, generate a second waveform of the correct/clear recording of the bird, i.e. the ‘target’ sound you’re trying to emulate. Then, compare the two waveforms.

So I set out to try and find a way to generate a waveform in Godot. And immediately, I was a bit disheartened. I found a few abandoned and unanswered forum threads made by people who needed help with the same task, some from a few years ago. I even found one poster that very adamantly said it was simply impossible. A lot of these posts were linked to each other across cyberspace, building on ideas posted in other threads years before, refining pieces of code from other posters. It was a nice microcosm of public problem solving, but I found no easy solution. There were a few hints of success here and there. In classic fashion, I found one programmer in Mexico who claimed that he cracked it, and he posted a screenshot of the most beautiful waveforms I’ve ever seen. He said he would “post the code soon” but the thread stopped there. This was in 2019. (Naturally, I emailed him, but no response yet.) I continued my quest, making lots of bookmarks along the way...

One user, whose posts I found on several different websites and forums all relating to this subject, found some success by turning the audio file into a PoolByteArray—basically turning the audio file into a huge array of like 16000 numbers, like pure data—and using Godot’s draw functions to draw a line for each data point. Again, I’m new to all this shit, especially anything related to draw functions. So I tried playing with that idea, creating lines for each data point and trying to arrange them somewhere on the screen. I had middling success:

But it was an absolute slog of trial and error. One of the issues was that the PoolByteArrays were huge and unpredictable: a 4-second chirp was like 30,000 datapoints (each datapoint is a number 0 to 255), and a 2-second ribbit was twice that for some reason.

If I tried to generate a line for each datapoint, my computer would nearly crash. So I also had to find some way to compress the data while still maintaining its shape. I literally had to sit down and do a math problem on paper to figure out what to do next.

Luckily, the for loop I was using (“for every number in array, generate xyz line”) had a built in interval functionality (“for every 200th number in array, generate xyz line”). To figure out the interval, I divided the data array’s size by 200, and used that number as the interval. That way, I could turn 30,000 numbers and 300,000 numbers into simplified lists of 200, while still maintaining the general shape of the data. Quick and dirty, but does the trick for now.

Eventually, one of the formulas started working for me. I was a getting a waveform-ish shape!

At this stage, the waveform system is still totally separate from my recording/playback system; they don’t interact yet. This is the waveform generated from a wav file of a robin chirp.

The result is not a pretty waveform, but its close. Again: prototype. I’m just trying to see if I can get it to work. Refinement can come later.

So I can generate a waveform. Hell, I can generate one right on top of another:

Now we’re getting somewhere.

Next I had to figure out how to compare the two shapes. My first idea was to find some way to save the waveforms as separate PNGs, and then install an outside code library I found that would give me access to a “find average pixel color” function, which would take an image and average all of its colors into one value. In theory, I could find the average color for the two waveforms and use that as a barometer for accuracy. If there was a bunch of extra sounds in the player’s recording, the average would be way off; if the recording of the bird was clear, it would have a similar amount of lines, and therefore a similar average pixel color.

I casually mentioned this idea to my friend Steve, who offered a simpler idea: why not just measure the differences in the y-values of the lines? It seemed hard until I thought about it for like twenty minutes.

Since I was basically working with two huge sets of data, can’t I just find the difference between the various points? The answer is yes, and I generated a third waveform that represented the difference between the values that made up the points of the two previous waveforms:

It looks a little wonky but it also looks like progress. This is another lesson: ask your friend Steve sometimes.

So far, I can already see a million problems with this system. What if the recording starts 3 seconds too early, and everything is way off in the comparison? What’s up with all those spiky bits in the waveforms? Why isn’t it more smooth? I’m trying to shut off those criticisms for now until we get to a stage where I can test. This could still work! But next I must link my systems. Here's my waveform code btw: