To produce our Fanboys podcast I have to sync my local recordings with those of others participating via Facetime. I usually do this by lining up the tracks in an audio editor. After years, today I had the idea to finally automate this with a simple python script. Yay!


I works like this: First I calculate a minimal overlapping area of audio (because sometime some recordings start way earlier than other), and pick a small area to sample in the audio I need to sync. That sample gets further reduce to a 2 second snippet where someone is actually

speaking via voice segmentation. After that I match for that snippet in the master track via FFT convolution. Then it's just calculating offsets and cutting wav files: Fully automatic double-ender. I'm so happy!

@map very nice! if I understand correctly that does account for clocks that speed up / slow down, right? e.g. the temperature changes and one track might be in sync in the beginning and end but drift out-of-sync in the middle

@map we had this phenomenon even sitting at the same desk with 4 USB microphones. I could have used a solution like this to reduce manual stretching and compressing of time

@lastfuture no, I don't control for stretching as it isn't an issue usually™. This takes care of syncing up tracks to a master so I have a synced separate track for each speaker so I can do crossgating and stuff like that.

Sign in to participate in the conversation

A place for the XOXO Festival community. Share your dreams, your struggles, your cat photos, or whatever else strikes your fancy, and see what everyone else is sharing.

This space is just for XOXO members. Never heard of Mastodon? Head over to to learn more and start posting.