With fake news at an all-time high, credible news sources and valuable information is increasingly becoming scarce. With the internet enabling anyone to publish a story and image, the individual must decide whether it is genuine or fake news. Video, however, has consistently been a reliable source of reference because one can attribute a quote and or an action when captured on a video format. Unfortunately, this reality is all about to change due to Deep Video Portrait technology capable of rendering in real time virtually indistinguishable video portraits of people’s full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor.
Deep Fake Portraits is a form on face recognition technology used in familiar applications such as Snapchat filters and digital camera autofocusing. Big budget films have also used a significant amount of computer-generated imagery (CGI) where faces have been swapped or altered, a notable example being the clones of Agent Smith from the film The Matrix.
Developed by Michael Zollhöfer and his colleagues, Deep Fake Portraits works by using a network that takes an input synthetic renderings of a parametric face model to predict photo-realistic video frames for the given target actor. In order to render photorealistic results, adversarial training is used to create modified target videos that mimic the behaviour of the synthetically-created input and to enable source-to-target video re-animation, a render of a synthetic target video with the reconstructed head animation parameters from a source video is fed into the trained network.
By using audio editing and generating prototype software such as Adobe Voco, the ability to edit and generate audio becomes reality. By inputting approximately 20 minutes of a desired target’s speech, the software can then generate accurate voices with the ability include new words and phrases by simply typing in the desired results in the software. By combing the Voco software with Deep Video Portraits, a video of a specific target can be generated to say virtually anything and visually indistinguishable from the naked eye from whether it is genuine or fake.
Though there are fraud detection and watermarking algorithms to determine if a video has been manipulated, Deep Fake Portrait technology will inevitably improve as time passes and will gradually become more difficult to determine fakes.
Read the full paper in the link below.
https://web.stanford.edu/~zollhoef/papers/SG2018_DeepVideo/paper.pdf