As an independent research designer and visual speculator, Lenka Hamosova (together with Pavol Rusnak) is experimenting with deepfake technology. According to Lenka, what the majority of internet users understand under the term of deepfakes is just the tip of the iceberg. She prefers to call the technology ‘synthetic media’, since the generated AI media can do much more than just face swaps. It has a promising, but not yet realised potential, for various creative industries, advertising, and our everyday communication. For this dossier, we’ve invited her to create a deepfake, or synthetic media piece, that will show us how this technology could be used in our near future.
What’s personalised synthetic advertising?
Lenka Hamosova: The sensational face-swaps, predominantly used in porn and all kinds of scary manipulation scenarios within a political context, completely overshadow the tremendous potential that lies beneath the surface. Synthetic media is a term for the more general understanding of generated AI media. The deep learning models which are able to generate visuals, sound, and text—and their combination—allows for the existence of a completely false reality. Our eyes or ears would not be able to distinguish this reality from our other experiences. The quality of synthetic media has not yet reached hyper-realistic results. But considering the exponential development in this field, it is just a matter of time until we start seeing the first professional applications of synthetic media in our personal daily lives. One way that we can see this manifesting is in personalised synthetic advertising for example.
Personalised marketing has become a successful strategy to deliver individualised messages and product offerings to customers, based on collected data and its analysis. If you use social media, you’re already being targeted with personalised commercial content. After you’ve browsed Zalando’s webshop, Zalando ads show up on your Facebook page. Or chatting about your period with your friend in Messenger results in YouTube ads for menstrual products. Imagine that these creepy personalised ads go one step further: commercials not only targeted at you, but tailor-made for you. In that YouTube advertisement, the feminine sanitary product is not recommended by a random model, but by the friend you’re chatting with. 🤯
This is very possible with the potential of video and sound synthesis in deepfakes and it’s most likely to happen sooner or later. Personalised synthetic advertising would not only increase the chance that the offered product is what you’re looking for; it might make you more likely to buy it because of the seductive power of familiarity. Seeing your mom’s face popping up from your phone screen, or on the latest smart kitchen device, recommending the best brand of butter while you’re making her famous apple pie recipe might actually be more helpful than annoying. Seeing your friends hanging out in the park with bottles of Heineken might not only make you crave beer, but inspire you to organise a BBQ and socialise more. An interesting question is whether this is purely evil or not. The non-consensual use of someone’s face is definitely crossing the line of privacy, but on the other hand such precedence has already happened. Our society will probably get used, once again, to losing just a little bit more of our private data.
Personalised synthetic advertising could be implemented in familiar places for advertisements, such as social media feeds. But these ads can also pop-up in more unconventional locations. This is what Pavol and I explored in our futuristic deepfake video. While being in quarantine ourselves, we took this situation where all our social life moved online as a starting point. The synthetic advertising was placed inside a video call, where it’s triggered by a specific phrase. In our scenario, the phrase “dinner tonight” triggers an automatic fullscreen ad for Uber Eats. Recently, a lot has been written about the data some of the video conferencing apps that we’ve all been using collect. Technically it would be possible to take the call’s video as a source for “deepfaking” the call’s participant and directly applying their face to the personalised advertisement. Our video shows this scenario.
The scenario of the video
It’s Sunday evening at 19:00. We’re in the COVID-19 pandemic. Because of widespread lock-down measures, people have resorted to connecting with each other mainly via video conferencing tools. The two friends in the video, Lenka from Prague and Lisa from Utrecht, call each other via Room, which has become the most popular video conferencing app—chatting about their experiences with the lockdown and other things you talk about with friends.
Watch the video to see what happens next:
Behind the scenes: how is the video made?
The deepfake was made with DeepFaceLab, currently the most common software used for celebrity face-swaps. There are many already pre-trained models based on celebrity datasets available, for example at MrDeepFake Forums (warning: NSFW!).
Usually, people aim to take a celebrity’s face and place it on an unknown actor (for example in deepfake porn or with deepfaked politicians). For this video we did the exact opposite: we took the face of an ordinary person and placed it on the footage of a celebrity.
The process starts with having a source and a destination video. For the source, we filmed a two-minute video where Lenka keeps speaking random sentences and the camera moves around to capture her face from various angles.
For the destination, we used an Uber Eats Commercial starring Kim Kardashian.
The first step of the production is to use DeepFaceLab to process both videos and create so-called ‘aligned videos’ which are cropped to focus only on the face area. (Notice how it aligns the position of the eyes in every frame). In the process, DeepFaceLab also creates a debug video where you can inspect how exactly the face is detected and the video is cropped.
Once we have the crop-aligned videos, we can feed their individual frames into the training process of DeepFaceLab. During this process we can inspect the output and stop the training once the differences between iterations are barely visible. You need a decent graphics card to do this (for example 1080 Ti GPU) and it takes around 10 to 30 hours for the full process to run.
When we are satisfied with the result, we can use the “merge” operation to generate the faces and apply them to the destination video.
And that’s it! The deepfake video is ready!
As you can see, the process is not that hard to follow. If you decide to work with pre-trained models, you can achieve great results quite fast. It is good to know that the video we created is a speculative scenario and the implementation of this exact process in live video calls is not going to work. DeepFaceLab simply can not produce real-time results.* But this does not mean another tool might not be able to do that soon, especially if it’s developed by a company that has enough financial and human resources for this. It is important, both as creators and users, to understand the logic behind how synthetic media are made. The rest is just a matter of time and (someone’s) energy.
*UPDATE: While this was being written, an open-source tool for creating basic live facial reenactment deepfakes on Zoom and Skype called Avatarify was released on GitHub. The tool is using First Order Motion Model for Image Animation, which can take a driving video and merge it with one destination image to generate new animated video. This could take us one step closer to real-time video-call deepfakes, however, at the moment this needs an extremely powerful graphics card to run (1080 Ti GPU can generate 33 fps, while a better-than-average MacBook would generate only ~1 fps).
What do you think about this application of synthetic media in advertising? Lenka would like to hear your opinion. Do you want learn more about synthetic media and its potential? Check out Lenka’s brainstorming cards.