Microsoft launches AI tool to make still images talk - with eerily realistic results

Microsoft launches AI tool to make still images talk - with eerily realistic results

Technology

Microsoft admits the tool could be misused

Follow on
Follow us on Google News

(Web Desk) - The boundary between what's real and what's not is becoming ever thinner thanks to a new AI tool from Microsoft.

Called VASA-1, the technology transforms a still image of a person's face into an animated clip of them talking or singing.

Lip movements are 'exquisitely synchronised' with audio to make it seem like the subject has come to life, the tech giant claims.

In one example, Leonardo da Vinci's 16th century masterpiece 'The Mona Lisa' starts rapping crudely in an American accent.

However, Microsoft admits the tool could be 'misused for impersonating humans' and is not releasing it to the public.

VASA-1 takes a static image of a face – whether it's a photo of a real person or an artwork or drawing of someone fictional.

It then 'meticulously' matches this up with audio of speech 'from any person' to make the face come to life.

The AI was trained with a library of facial expressions, which even lets it animate the still image even in real time – so as the audio is being spoken.

In a blog post, Microsoft researchers describe VASA as a 'framework for generating lifelike talking faces of virtual characters'.

'It paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors,' they say.

'Our method is capable of not only producing precious lip-audio synchronisation, but also capturing a large spectrum of emotions and expressive facial nuances and natural head motions that contribute to the perception of realism and liveliness.'

In terms of use cases, the team thinks VASA-1 could enable digital AI avatars to 'engage with us in ways that are as natural and intuitive as interactions with real humans'.

Another potential risk is fraud, as people online could be duped by a fake message from the image of someone they trust.

Jake Moore, a security specialist at ESET, said 'seeing is most definitely not believing anymore'.

'As this technology improves, it is a race against time to make sure everyone is fully aware of what is capable and that they should think twice before they accept correspondence as genuine,' he told MailOnline.

Anticipating concerns that the public might have, the Microsoft experts said VASA-1 is 'not intended to create content that is used to mislead or deceive'.

'However, like other related content generation techniques, it could still potentially be misused for impersonating humans,' they add.
'We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection.

'Currently, the videos generated by this method still contain identifiable artifacts, and the numerical analysis shows that there's still a gap to achieve the authenticity of real videos.'

Microsoft admits that existing techniques are still far from 'achieving the authenticity of natural talking faces', but the capability of AI is growing rapidly.