Why Nothing Sounds Quite Like LyreBird
Fans of RTVS may remember Joshua, beautiful baby boy of wayneradiotv. If you're like me, you might be wondering why Joshua stayed dead after LyreBird shut down. Why couldn't he be brought back with a different TTS? The fact is that LyreBird was a product of a very specific time in AI TTS. In March 2017, Google released a paper on Tacotron [1], one of the first AI TTS's with real success. In April 2017, LyreBird began showing off their TTS buisness [2]. As AI bros are wont to do they took that shit. LyreBird is a version of Tacotron. It incorporates technologies that would be published in the next few Tacotron papers [3] including multi-speaker, prosody encoding, and prosody prediction. And in February 2018, Tacotron 2 came out [4].
Tacotron 2 is better in every way. It's faster, better at imitation, and simpler. This makes it much more economical to run and fine-tune on a specific speaker, so every subsequent AI TTS is based off of Tacotron 2.
If you read the paper, Tacotron 1 has a lot of arbitrary and untested choices. It's clear that they published it in a hurry to prove that it could be done, but they hadn't refined it to cut the unnecessary fluff.
This brings me to why I'm writing this. I hope it's clear that I did a lot of research for this. That's because I did my best to recreate LyreBird, named LyingBard, and I've put it up for you to play with here.
You may notice though that it's not quite right. The main reason is that I had to go with a low quality version (reduction factor 5 for those who read the paper). A high quality version would take too long to train with my current set up and I'm almost certain that's what they used.
If I got about $100 in donations, I'm pretty sure I could get a high quality version trained in about a month. It still wouldn't sound exactly the same. Due to the chaotic nature of training a neural network, anything short of getting the actual files off LyreBird (now Descript's) servers won't make it sound exactly the same.
Regardless, LyingBard is here to stay. It's hosted on a free server so I have no reason to take it down. I'll be posting about updates here on this blog. I'm working towards getting custom voices ready at the moment and I've got some ideas for new features and fun toys for the future.
Thanks for reading!
Here's some sources if you wanna learn more about stuff I mentioned:
[1] https://arxiv.org/abs/1703.10135
[2] https://www.pcmag.com/news/lyrebird-can-listen-and-copy-any-voice-in-one-minute
[3] https://google.github.io/tacotron/
[4] https://arxiv.org/abs/1712.05884













