Microsoft’s new VALL-E “neural codec language model” can closely simulate a person’s voice when given just a three-second audio sample, and once it learns a specific voice, it can synthesise audio of that person saying anything, including with the right emotional tone. Pair VALL-E with another generative AI models such as GPT-3 and you have a content […]