haxxiy said:
For the SOTA models, speech synthesis has been indistinguishable to the average person from real voices for years now. Having voice actors generate half a minute of 48 KHz samples would be enough nowadays for all purposes, let alone just saying someone's name here and there. You probably can't do the latter locally, though, it would have to be all cloud-based. |
In film/TV industry we use trained AI voices of actors as placeholders sometimes, until we can get ADR from actual actors who should be delivering those lines - not only they sound, both technically and artistically, as if they were voiced by real actors, but on not so rare occasion they sound better than actual ADR once we finally receive them.
Last edited by HoloDust - 2 hours ago






