Steven Seitz – 91̽��News

ClearBuds: First wireless earbuds that clear up calls using deep learning

Sarah McQuate — Mon, 11 Jul 2022 15:55:49 +0000

ClearBuds use a novel microphone system and are one of the first machine-learning systems to operate in real time and run on a smartphone. Photo: Raymond Smith/91̽��

As meetings shifted online during the COVID-19 lockdown, many people found that chattering roommates, garbage trucks and other loud sounds disrupted important conversations.

This experience inspired three 91̽�� researchers, who were roommates during the pandemic, to develop better earbuds. To enhance the speaker’s voice and reduce background noise, “ClearBuds” use a novel microphone system and one of the first machine-learning systems to operate in real time and run on a smartphone.

The researchers at the ACM International Conference on Mobile Systems, Applications, and Services.

“ClearBuds differentiate themselves from other wireless earbuds in two key ways,” said co-lead author , a doctoral student in the Paul G. Allen School of Computer Science & Engineering. “First, ClearBuds use a dual microphone array. Microphones in each earbud create two synchronized audio streams that provide information and allow us to spatially separate sounds coming from different directions with higher resolution. Second, the lightweight neural network further enhances the speaker’s voice.”

While most commercial earbuds also have microphones on each earbud, only one earbud is actively sending audio to a phone at a time. With ClearBuds, each earbud sends a stream of audio to the phone. The researchers designed Bluetooth networking protocols to allow these streams to be synchronized within 70 microseconds of each other.

The team’s neural network algorithm runs on the phone to process the audio streams. First it suppresses any non-voice sounds. And then it isolates and enhances any noise that’s coming in at the same time from both earbuds — the speaker’s voice.

“Because the speaker’s voice is close by and approximately equidistant from the two earbuds, the neural network can be trained to focus on just their speech and eliminate background sounds, including other voices,” said co-lead author , a doctoral student in the Allen School. “This method is quite similar to how your own ears work. They use the time difference between sounds coming to your left and right ears to determine from which direction a sound came from.”

Shown here, the ClearBuds hardware (round disk) in front of the 3D printed earbud enclosures. Photo: Raymond Smith/91̽��

When the researchers compared ClearBuds with Apple AirPods Pro, ClearBuds performed better, achieving a higher signal-to-distortion ratio across all tests.

“It’s extraordinary when you consider the fact that our neural network has to run in less than 20 milliseconds on an iPhone that has a fraction of the computing power compared to a large commercial graphics card, which is typically used to run neural networks,” said co-lead author , a doctoral student in the Allen School. “That’s part of the challenge we had to address in this paper: How do we take a traditional neural network and reduce its size while preserving the quality of the output?”

The team also tested ClearBuds “in the wild,” by recording eight people reading from in noisy environments, such as a coffee shop or on a busy street. The researchers then had 37 people rate 10- to 60-second clips of these recordings. Participants rated clips that were processed through ClearBuds’ neural network as having the best noise suppression and the best overall listening experience.

For more information, check out the team’s .
The hardware and software design for ClearBuds is open source and

One limitation of ClearBuds is that people have to wear both earbuds to get the noise suppression experience, the researchers said.

But the real-time communication system developed here can be useful for a variety of other applications, the team said, including smart-home speakers, tracking robot locations or search and rescue missions.

The team is currently working on making the neural network algorithms even so that they can run on the earbuds.

Additional co-authors are , an associate professor in the Allen School; , a professor in both the Allen School and the electrical and computer engineering department; and and , both professors in the Allen School. This research was funded by The National Science Foundation and the 91̽��’s Reality Lab.

For more information, contact the team at clearbuds@cs.washington.edu.

91̽��researchers can turn a single photo into a video

Sarah McQuate — Mon, 14 Jun 2021 17:48:50 +0000

91̽��researchers have created a deep learning method that can animate flowing material, such as waterfalls, smoke or clouds. Shown here is Snoqualmie Falls animated using the team’s method. (original photo: Sarah McQuate/91̽��)

Sometimes photos cannot truly capture a scene. How much more epic would that vacation photo of Niagara Falls be if the water were moving?

For journalists

Researchers at the 91̽�� have developed a deep learning method that can do just that: If given a single photo of a waterfall, the system creates a video showing that water cascading down. All that’s missing is the roar of the water and the feeling of the spray on your face.

The team’s method can animate any flowing material, including smoke and clouds. This technique produces a short video that loops seamlessly, giving the impression of endless movement. The researchers June 22 at the .

“A picture captures a moment frozen in time. But a lot of information is lost in a static image. What led to this moment, and how are things changing? Think about the last time you found yourself fixated on something really interesting — chances are, it wasn’t totally static,” said lead author , a doctoral student in the Paul G. Allen School of Computer Science & Engineering.

“What’s special about our method is that it doesn’t require any user input or extra information,” Hołyński said. “All you need is a picture. And it produces as output a high-resolution, seamlessly looping video that quite often looks like a real video.”

Eastern Washington’s Palouse Falls animated using the team’s method. (original photo: Sarah McQuate/91̽��)

Developing a method that turns a single photo into a believable video has been a challenge for the field.

“It effectively requires you to predict the future,” Hołyński said. “And in the real world, there are nearly infinite possibilities of what might happen next.”

The team’s system consists of two parts: First, it predicts how things were moving when a photo was taken, and then uses that information to create the animation.

To estimate motion, the team trained a neural network with thousands of videos of waterfalls, rivers, oceans and other material with fluid motion. The training process consisted of asking the network to guess the motion of a video when only given the first frame. After comparing its prediction with the actual video, the network learned to identify clues — ripples in a stream, for example — to help it predict what happened next. Then the team’s system uses that information to determine if and how each pixel should move.

The researchers tried to use a technique called “splatting” to animate the photo. This method moves each pixel according to its predicted motion. But this created a problem.

“Think about a flowing waterfall,” Hołyński said. “If you just move the pixels down the waterfall, after a few frames of the video, you’ll have no pixels at the top!”

So the team created “symmetric splatting.” Essentially, the method predicts both the future and the past for an image and then combines them into one animation.

“Looking back at the waterfall example, if we move into the past, the pixels will move up the waterfall. So we will start to see a hole near the bottom,” Hołyński said. “We integrate information from both of these animations so there are never any glaringly large holes in our warped images.”

To animate the image, the team created “symmetric splatting,” which predicts both the future and the past for an image and then combines them into one animation. Photo: Hołyński et al./CVPR

Finally, the researchers wanted their animation to loop seamlessly to create a look of continuous movement. The animation network follows a few tricks to keep things clean, including transitioning different parts of the frame at different times and deciding how quickly or slowly to blend each pixel depending on its surroundings.

The team’s method works best for objects with predictable fluid motion. Currently, the technology struggles to predict how reflections should move or how water distorts the appearance of objects beneath it.

See more details about this paper

on the team’s
in

“When we see a waterfall, we know how the water should behave. The same is true for fire or smoke. These types of motions obey the same set of physical laws, and there are usually cues in the image that tell us how things should be moving,” Hołyński said. “We’d love to extend our work to operate on a wider range of objects, like animating a person’s hair blowing in the wind. I’m hoping that eventually the pictures that we share with our friends and family won’t be static images. Instead, they’ll all be dynamic animations like the ones our method produces.”

Co-authors are and , both professors in the Allen School, and , an affiliate professor in the Allen School. This research was funded by the 91̽��Reality Lab, Facebook, Google, Futurewei and Amazon.

For more information, contact Hołyński at holynski@cs.washington.edu.