Maruchi Kim – 91̽��News

Tiny cameras in earbuds let users talk with AI about what they see

Stefan Milne — Tue, 14 Apr 2026 14:38:00 +0000

91̽��researchers developed a system called VueBuds that uses tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. Here, the altered headphones are shown with the camera inserted. Photo: Kim et al./CHI ‘26

91̽�� researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. For instance, a user might turn to a Korean food package and say, “Hey Vue, translate this for me.” They’d then hear an AI voice say, “The visible text translates to ‘Cold Noodles’ in English.”

The prototype system called VueBuds takes low-resolution, black-and-white images, which it transmits over Bluetooth to a phone or other nearby device. A small artificial intelligence model on the device then answers questions about the images within around a second. For privacy, all of the processing happens on the device, a small light turns on when the system is recording, and users can immediately delete images.

The team will April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona.

“We haven’t seen most people adopt smart glasses or VR headsets, in part because a lot of people don’t like wearing glasses, and they often come with , such as recording high-resolution video and processing it in the cloud,” said senior author , a 91̽��professor in the Paul G. Allen School of Computer Science & Engineering. “But almost everyone wears earbuds already, so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process.”

Cameras use far more power than the microphones already in earbuds, so using the same sort of high-res cameras as those in smart glasses wouldn’t work. Also, large amounts of information can’t stream continuously over Bluetooth, so the system can’t run continuous video.

The team found that using a low-power camera — roughly the size of a grain of rice — to shoot low-resolution, black-and-white still images limited battery drain and allowed for Bluetooth transmission while preserving performance.

There was also the matter of placement.

“One big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user’s view of the world reliably?” said lead author , who completed this work as a 91̽��doctoral student in the Allen School.

The team found that angling each camera 5-10 degrees outward provides a 98-108 degree field of view. While this creates a small blind spot when objects are held closer than 20 centimeters from the user, people rarely hold things that close to examine them — making it a non-issue for typical interactions.

Researchers also discovered that while the vision language model was largely able to make sense of the images from each earbud, having to process images from both earbuds slowed it down. So they had the system “stitch” the two images into one, identifying overlapping imagery and combining it. This allows the system to respond in one second — quick enough to feel like real-time for users — rather than the two seconds it takes with separate images.

The team then had 74 participants compare recorded outputs from VueBuds with outputs from Ray-Ban Meta Glasses in a series of tests. Despite VueBuds using low-resolution images with greater privacy controls and the Ray-Bans taking high-res images processed on the cloud, the two systems performed equivalently. Participants preferred VueBuds’ translations, while the Ray-Bans did better at counting objects.

Visit for more information
Story in
A smart ring with a tiny camera lets users point and click to control home devices
AI headphones automatically learn who you’re talking to — and let you hear them better

Sixteen participants also wore VueBuds and tested the system’s ability to translate and answer basic questions about objects. VueBuds achieved 83-84% accuracy when translating or identifying objects and 93% when identifying the author and title of a book.

This study was designed to gauge the feasibility of integrating cameras in wireless earbuds. Since the system only takes grayscale images, it can’t answer questions that involve color in the scene.

The team wants to add color to the system — color cameras require more power — and to train specialized AI models for specific use cases, such as translation.

“This study lets us glimpse what’s possible just using a general purpose language model and our wireless earbuds with cameras,” Kim said. “But we’d like to study the system more rigorously for applications like reading a book — for people who have low vision or are blind, for instance — or translating text for travelers.”

Co-authors include , a 91̽��master’s student in the Allen School, and , , , and , all 91̽��students in electrical and computer engineering.

For more information, contact vuebuds@cs.washington.edu.

A smart ring with a tiny camera lets users point and click to control home devices

Stefan Milne — Wed, 08 Jan 2025 17:05:27 +0000

While smart devices in homes have grown to include speakers, security systems, lights and thermostats, the ways to control them have remained relatively stable. Users can interact with a phone, or talk to the tech, but these are frequently less convenient than the simple switches they replace: “Turn on the lamp…. Not that one…. Turn up the speaker volume…. Not that loud!”

91̽�� researchers have developed IRIS, a smart ring that allows users to control smart devices by aiming the ring’s small camera at the device and clicking a built-in button. The prototype Bluetooth ring sends an image of the selected device to the user’s phone, which controls the device. The user can adjust the device with the button and — for devices with gradient controls, such as a speaker’s volume — by rotating their hand. IRIS, or Interactive Ring for Interfacing with Smart home devices, operates off a charge for 16-24 hours.

The team Oct. 16 at the 37th Annual ACM Symposium on User Interface Software and Technology in Pittsburgh. IRIS is not currently available to the public.

“Voice commands can often be really cumbersome,” said co-lead author , a 91̽��doctoral student in the Paul G. Allen School of Computer Science & Engineering. “We wanted to create something that’s as simple and intuitive as clicking on an icon on your computer desktop.”

91̽��researchers have developed IRIS, a smart ring that allows users to point and click to control smart devices. Here, the ring (left) is shown beside its circuit board and battery and a quarter. Photo: Kim et al./UIST ‘24

The team decided to put the system in a ring because they believed users would realistically wear that throughout the day. The challenge, then, was integrating a camera into a wireless smart ring with its size and power constraints. The system also had to toggle devices in under a second; otherwise, users tend to think it is not working.

To achieve this, researchers had the ring compress the images before sending them to a phone. Rather than streaming images all the time, the ring gets activated when the user clicks the button, then turns off after 3 seconds of inactivity.

Related:

For more information, visit

In a study with 23 participants, twice as many users preferred IRIS over a voice command system alone (in this case, Apple’s Siri). On average, IRIS controlled home devices more than two seconds faster than voice commands.

“In the future, integrating the IRIS camera system into a health-tracking smart ring would be a transformative step for smart rings,” Kim said. “It’d let smart rings actually augment or improve human capability, rather than just telling you your step count or heart rate.”

, — both 91̽��doctoral students in the Allen School — were co-lead authors on the study, and , a 91̽��professor in the Allen School, was the senior author. Additional co-authors include , a 91̽��research assistant in the Allen School; , a 91̽��undergraduate in the Allen School; , a 91̽��master’s student in the Allen School; and , a 91̽��professor in the Allen School. This research was funded by a Moore Inventor Fellow award and the National Science Foundation.

For more information, contact iris@cs.washington.edu.

ClearBuds: First wireless earbuds that clear up calls using deep learning

Sarah McQuate — Mon, 11 Jul 2022 15:55:49 +0000

ClearBuds use a novel microphone system and are one of the first machine-learning systems to operate in real time and run on a smartphone. Photo: Raymond Smith/91̽��

As meetings shifted online during the COVID-19 lockdown, many people found that chattering roommates, garbage trucks and other loud sounds disrupted important conversations.

This experience inspired three 91̽�� researchers, who were roommates during the pandemic, to develop better earbuds. To enhance the speaker’s voice and reduce background noise, “ClearBuds” use a novel microphone system and one of the first machine-learning systems to operate in real time and run on a smartphone.

The researchers at the ACM International Conference on Mobile Systems, Applications, and Services.

“ClearBuds differentiate themselves from other wireless earbuds in two key ways,” said co-lead author , a doctoral student in the Paul G. Allen School of Computer Science & Engineering. “First, ClearBuds use a dual microphone array. Microphones in each earbud create two synchronized audio streams that provide information and allow us to spatially separate sounds coming from different directions with higher resolution. Second, the lightweight neural network further enhances the speaker’s voice.”

While most commercial earbuds also have microphones on each earbud, only one earbud is actively sending audio to a phone at a time. With ClearBuds, each earbud sends a stream of audio to the phone. The researchers designed Bluetooth networking protocols to allow these streams to be synchronized within 70 microseconds of each other.

The team’s neural network algorithm runs on the phone to process the audio streams. First it suppresses any non-voice sounds. And then it isolates and enhances any noise that’s coming in at the same time from both earbuds — the speaker’s voice.

“Because the speaker’s voice is close by and approximately equidistant from the two earbuds, the neural network can be trained to focus on just their speech and eliminate background sounds, including other voices,” said co-lead author , a doctoral student in the Allen School. “This method is quite similar to how your own ears work. They use the time difference between sounds coming to your left and right ears to determine from which direction a sound came from.”

Shown here, the ClearBuds hardware (round disk) in front of the 3D printed earbud enclosures. Photo: Raymond Smith/91̽��

When the researchers compared ClearBuds with Apple AirPods Pro, ClearBuds performed better, achieving a higher signal-to-distortion ratio across all tests.

“It’s extraordinary when you consider the fact that our neural network has to run in less than 20 milliseconds on an iPhone that has a fraction of the computing power compared to a large commercial graphics card, which is typically used to run neural networks,” said co-lead author , a doctoral student in the Allen School. “That’s part of the challenge we had to address in this paper: How do we take a traditional neural network and reduce its size while preserving the quality of the output?”

The team also tested ClearBuds “in the wild,” by recording eight people reading from in noisy environments, such as a coffee shop or on a busy street. The researchers then had 37 people rate 10- to 60-second clips of these recordings. Participants rated clips that were processed through ClearBuds’ neural network as having the best noise suppression and the best overall listening experience.

For more information, check out the team’s .
The hardware and software design for ClearBuds is open source and

One limitation of ClearBuds is that people have to wear both earbuds to get the noise suppression experience, the researchers said.

But the real-time communication system developed here can be useful for a variety of other applications, the team said, including smart-home speakers, tracking robot locations or search and rescue missions.

The team is currently working on making the neural network algorithms even so that they can run on the earbuds.

Additional co-authors are , an associate professor in the Allen School; , a professor in both the Allen School and the electrical and computer engineering department; and and , both professors in the Allen School. This research was funded by The National Science Foundation and the 91̽��’s Reality Lab.

For more information, contact the team at clearbuds@cs.washington.edu.