91探花

Skip to content

 

Big Tech鈥檚 race into augmented reality (AR) grows more competitive by the day. This month, Meta of its headset, the Quest 3. Early next year, Apple its first headset, the Vision Pro. The announcements for each platform emphasize and that merge the virtual and physical worlds: a digital board game imposed on a coffee table, a movie screen projected above airplane seats.

Some researchers, though, are more curious about other uses for AR. The 91探花鈥檚 is applying these budding technologies to assist people with disabilities. This month, researchers from the lab will introduce multiple projects that deploy AR 鈥 through headsets and phone apps 鈥 to make the world more accessible.

Researchers from the lab will RASSAR, an app that can聽scan homes to highlight accessibility and safety issues, on Oct. 23 at the in New York.

Shortly after, on Oct. 30, other teams in the lab will present early research at the conference in San Francisco. One app and the other aims to for low-vision users.

91探花News spoke with the three studies鈥 lead authors, and , both 91探花doctoral students in the Paul G. Allen School of Computer Science & Engineering, about their work and the future of AR for accessibility.

What is AR and how is it typically used right now?

Jae Lee: I think one commonly accepted answer is that you use a wearable headset or a phone to superimpose virtual objects in a physical environment. A lot of people probably know AR from 鈥淧ok茅mon Go,鈥 where you’re superimposing these Pok茅mon into the physical world. Now Apple and Meta are introducing 鈥渕ixed reality鈥 or passthrough AR, which further blends the physical and virtual worlds through cameras.

Xia Su: Something I have also been observing lately is people are trying to expand the definition beyond goggles and phone screens. There could be AR audio, which is manipulating your hearing, or devices trying to manipulate your smell or touch.

In Augmented Reality (AR) a headset or phone superimposes virtual objects in a physical space. In Virtual Reality (VR) a headset or goggles immerses the user in a virtual environment. Mixed Reality (MR) blends the physical and virtual worlds.

A lot of people associate AR with virtual reality, and it gets wrapped up in discussion of the metaverse and gaming. How is it being applied for accessibility?

JL: AR as a concept has been around for several decades. But in 鈥檚 lab, we鈥檙e combining AR with accessibility research. A headset or a phone can be capable of knowing how many people are in front of us, for example. For people who are blind or low vision, that information could be critical to how they perceive the world.

XS: There are really two different routes for AR accessibility research. The more prevalent one is trying to make AR devices more accessible to people. The other, less common approach is asking: How can we use AR or VR as tools to improve the accessibility of the real world? That鈥檚 what we’re focused on.

JL: As AR glasses become less bulky and cheaper, and as AI and computer vision advance, this research will become increasingly important. But widespread AR, even for accessibility, brings up a lot of questions. How do you deal with bystander privacy? We, as a society, understand that vision technology can be beneficial to blind and low-vision people. But we also might not want to include facial recognition technology in apps for privacy reasons, even if that helps someone recognize their friends.

Let鈥檚 talk about the papers you have coming out. First, can you explain your ?

XS: It’s an app that people can use to scan their indoor spaces and help them detect possible accessibility safety issues in homes. It鈥檚 possible because some iPhones now have (light detection and ranging) scanners that tell the depth of a space, so we can reconstruct the space in 3D. We combined this with models to highlight ways to improve safety and accessibility. To use it, someone 鈥 perhaps a parent who鈥檚 childproofing a home, or a caregiver 鈥 scans a room with their smartphone and RASSAR spots accessibility problems. For example, if a desk is too high, a red button will pop up on the desk. If the user clicks the button, there will be more information about why that desk鈥檚 height is an accessibility issue and possible fixes.

JL: Ten years ago, you would have needed to go through 60 pages of PDFs to fully check a house for accessibility. We boiled that information down into an app.

And this is something that anyone will be able to download to their phones and use?

XS: That鈥檚 the eventual goal. We already have a demo. This version relies on lidar, which is only on certain iPhone models right now. But if you have such a device, it鈥檚 very straightforward.

JL: This is an example of these advancements in hardware and software that let us create apps quickly. Apple announced , which creates a 3D floor plan of a room, when they added the lidar sensor. We鈥檙e using that in RASSAR to understand the general layout. Being able to build on that lets us come up with a prototype very quickly.

 

So RASSAR is nearly deployable now. The other areas of research you鈥檙e presenting are earlier in their development. Can you tell me about ?

JL:聽 It鈥檚 an app that uses an AR headset to enable people to speak more naturally with voice assistants like Siri or Alexa. There are all these pronouns we use when we speak that are difficult for computers to understand without visual context. I can ask 鈥淲here’d you buy it from?鈥 But what is 鈥渋t鈥? A voice assistant has no idea what I鈥檓 talking about. With GazePointAR, the goggles are looking at the environment around the user and the app is tracking the user鈥檚 gaze and hand movements. The model then tries to make sense of all these inputs 鈥 the word, the hand movements, the user鈥檚 gaze. Then, using a , GPT, it attempts to answer the question.

How does it sense what the motions are?

JL: We鈥檙e using a headset called HoloLens 2 developed by Microsoft. It has a gaze tracker that鈥檚 watching your eyes and trying to guess what you鈥檙e looking at. It has hand tracking capability as well. In a paper that we submitted building on this, we noticed that we have a lot of problems with this. For example, people don’t just use one pronoun at a time 鈥 we use multiple. We鈥檒l say, 鈥淲hat’s more expensive, this or this?鈥 To answer that, we need information over time. But, again, you can run into privacy issues if you want to track someone’s gaze or someone’s visual field of view over time: What information are you storing and where is it being stored? As technology improves, we certainly need to watch out for these privacy concerns, especially in computer vision.

This is difficult even for humans, right? I can ask, 鈥淐an you explain that?鈥 while pointing at several equations on a whiteboard and you won鈥檛 know which I鈥檓 referring to. What applications do you see for this?

JL: Being able to use natural language would be major. But if you expand this to accessibility, there鈥檚 the potential for a blind or low-vision person to use this to describe what鈥檚 around them. The question 鈥淚s anything dangerous in front of me?鈥 is also ambiguous for a voice assistant. But with GazePointAR, ideally, the system could say, 鈥淭here are possibly dangerous objects, such as knives and scissors.鈥 Or low-vision people might make out a shape, point at it, then ask the system what 鈥渋t鈥 is more specifically.

 

And finally you鈥檙e working on a system called . What is it and what prompted this research?

JL: This is going even more into the future than GazePointAR. ARTennis is a prototype that uses an AR headset to make tennis balls more salient for low vision players. The ball in play is marked by a red dot and has a crosshair of green arrows around it. Professor Jon Froehlich has a family member that wants to play sports with his children but doesn’t have the residual vision necessary to do so. We thought if it works for tennis, it’s going to work for a lot of other sports, since tennis has a small ball that shrinks as it gets further away. If we can track a tennis ball in real time, we can do the same with a bigger, slower basketball.

One of the co-authors on the paper is low vision himself, and he plays a lot of squash, and he wanted to try this application and give us feedback. We did a lot of brainstorming sessions with him, and he tested the system. The red dot and green crosshairs is the design that he came up with, to improve the sense of depth perception.

What鈥檚 keeping this from being something people can use right away?

JL: Well, like GazePointAR, it鈥檚 relying on a HoloLens 2 headset that鈥檚 $3,500. So that鈥檚 a different accessibility issue. It鈥檚 also running at roughly 25 frames per second and for humans to perceive in real time it needs to be about 30 frames per second. Sometimes we can鈥檛 capture the speed of the tennis ball. We’re going to expand the paper and include basketball to see if there are different designs people prefer for different sports. The technology will certainly get faster. So our question is: What will the best design be for the people using it?

For more information, contact聽Lee at jaewook4@cs.washington.edu, Su at xiasu@cs.washington.edu and Jon Froehlich at jonf@cs.washington.edu.