The video technology hub | Pexip

Low Latency Video Conferencing: Breaking the Sound Barrier | Pexip

Written by Håvard Graff;Principal Engineer | Nov 24, 2020 1:30:41 PM

This year has seen video conference usage skyrocket. As a lot more people are meeting online, some of the weaknesses with today's technology are becoming more apparent:

Interrupting each other at strange times,

 

"You go ahead."

"No, sorry, you go ahead."

"Ok, as I was saying… "

"Sure, my point was that…"

 

This isn't an unfamiliar experience for frequent users of video. At the core of this problem is that sound uses a lot more time to travel across the internet than most people realize.

 

To illustrate this point, let's imagine a normal conversation in a meeting room. Typically, those speaking would be less than 5 meters apart, which means that the sound delay, or latency as it's called, is around 0.015 seconds, or 15 milliseconds (ms) (sound travels roughly 1 meter in 3ms). This might not sound like a lot, and the human brain will perceive it as instantaneous. But at what point does latency become perceptible?

 

To explore this, we sought help from the music world. In order to be able to play together in sync, to feel a common rhythm or pulse, the latency needs to be very low. Thinking in terms of distance, most musicians would agree that being more than 10 meters apart makes it very hard to play together. This means that anything more than 30ms of latency is going to make it hard to play synchronised with a common pulse.

 

Let's relate this back to the video world, where it's common to experience latencies of up to 500ms. That is the equivalent of two people yelling at each other a whopping 160 meters apart! (A soccer-field is typically 100m). Best-case scenario with available technologies today, you're still looking at around 200ms, which is over 60 meters. No wonder it is hard to keep a natural conversation going!

 

To address this, we started the Pexip Ultra-Low Latency project, working tirelessly to bring down the latency to a point where it feels like being in the same room with the person you are talking to. As a way to validate our research, we asked a group of musicians to try and play together using our technology. There were a few key things that needed to happen: 

  1. Only if the latency was "ultra-low" would the musicians be able to find a common pulse and play together as if they were sitting in the same room.
  2. Anything more than 30ms (equivalent of 10 meters) of latency would make them uncomfortable and start to affect the performance.
  3. Raising the bar even higher, a comfortable experience for the musicians would mean achieving clean, high-quality sound with no noise, clicks or pops, all while keeping the latency stable and ultra-low.
  4. To make it even more difficult,  the Pexip solution works with a central component that all media passes through. This meant that it would not be enough to achieve 30ms of latency from one point to another. We needed to get the sound into our backend, mix it with the other performers, send it back out again, all in less than 30ms.

Ultra-low latency technology allowed the musicians to play Edvard Grieg's Holberg Suite perfectly in sync. It was a risky musical choice as it's very rhythmic and if the quartet was slightly out of sync it would be immediately obvious and not sound very pleasant.

 

Playing completely in sync over live video is almost impossible without optimized latency. When the experiment was conducted without the help of ultra-low latency at a normal level of about 250 ms, the result was a string quartet noticeably out of sync 😆

 

You can view the ultra-low latency experiment video on YouTube here:

 

 

Read more about how we ensure our customers have the highest quality meeting experience here: https://www.pexip.com/technology-program