In early 2024, a Hong Kong-based employee in a multinational firm joined a video meeting with the company’s CFO and other colleagues. The outcome of that meeting was a successful deepfake attack and a financial loss of €23 million. But “everything looked real” the victim said to media.
This isn’t the first deepfake attack and it won’t be the last. A research study from Sumsub revealed that deepfake identity fraud scams surged tenfold between 2022 and 2023, a number expected to rise once again this year. The repercussions of these attacks can be significant – from false testimonies in court to financial fraud to obtaining national security information.
The advancements in deepfake technology require equally advanced mitigation measures, which is why the world needs better tools for deepfake detection. But let’s first, let’s take a step back and look at how deepfake technology became so easily accessible.
Read why is private and secure AI so important for video conferencing?
With the rapid advancements in AI, there are now various tools publicly available that enable people to create a deepfake. This means that anyone who can download an app is suddenly able to employ techniques such as face swap, in which you superimpose the face of one person onto another during existing video or a live stream. When you combine that with technology to simulate another person’s voice, the attack can go unnoticed, as in the Hong Kong case.
The early deepfake tools were first available to the film industry, enabling them to finish their productions even when the actor could no longer participate in filming, for example. Another positive use is for speech-to-speech translation, in which audio is translated to another language and matched to the lips of the speaker. And we also see positive deepfake uses in the creation of educational content.
However, despite these noble purposes, the use of deepfake in video conferencing is more typically of a malicious nature these days.
One of the more common deepfake attack methodologies (used in video conferences) is called a “video injection attack”, which means the attacker modifies the camera data and injects another video stream. A video injection attack requires biometric sources of the person that is being impersonated, which can often be obtained through social media. From there, the attack can take place during a live video call without anyone noticing.
Working against these types of attacks will take a combination of measures. First, it requires that you use a secure video conferencing platform with proper authentication to verify identities. Taking a zero trust security approach can also mitigate the risk, while operating in an air-gapped network can eliminate the risk, but that’s not necessarily a viable solution for all.
Looking ahead, I believe that we must ramp up the focus on developing and integrating real-time, deep learning-based deepfake detection technology to ensure the audio-visual integrity of the meeting.
Deepfake detection technology looks for signs of inauthenticity in video. There are different methods employed today, such as technology that detects inconsistencies on the video, things like blinking, face warping, unusual lip movements or odd illumination of the face. But new and better approaches are emerging which use heartbeat data derived from video images, large language models, or speech characteristics to detect deepfakes.
Combining different methods and extending them with interactive approaches can yield much higher accuracy in terms of detection, which is why we need to continue pursuing these defense methods to keep pace with the increasing sophistication of deepfake technology.
Collaboration is the key, as no organization or nation can solve this challenge alone. It’s a good sign that there is ongoing legislation in Europe and the US related to AI and the use of deepfakes.
Through my role at Pexip, I have also started to collaborate with other experts to evaluate detection methods from different perspectives. Researching reliable detection methods is of course necessary, but it is also important to look into different integration approaches. For example, the requirements for integrating detection technologies are completely different when end-to-end encryption is used. We must also keep the perspective of the user in mind to achieve usability and transparency in terms of the technology.
Pexip is a leader in secure meeting solutions, so we aim to contribute to the increase and maturity of deepfake defense mechanisms. We work with customers in highly regulated industries who often have strict requirements, and meeting these often requires a combination of security measures, such as zero trust architecture, self-hosted solutions, plus the integration of deepfake detection technology from specialists.
I believe that with the right regulation, collaborative approach, and an approach of continual advancement in detection technologies, organizations will be much better equipped to protect themselves and preserve the integrity of their video meetings.
Learn more about Pexip Secure Meetings.