I works like this: First I calculate a minimal overlapping area of audio (because sometime some recordings start way earlier than other), and pick a small area to sample in the audio I need to sync. That sample gets further reduce to a 2 second snippet where someone is actually
speaking via voice segmentation. After that I match for that snippet in the master track via FFT convolution. Then it's just calculating offsets and cutting wav files: Fully automatic double-ender. I'm so happy!
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!