Stefan Milne – 91̽News /news Tue, 14 Apr 2026 14:38:50 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Tiny cameras in earbuds let users talk with AI about what they see /news/2026/04/14/cameras-in-wireless-earbuds-vuebuds/ Tue, 14 Apr 2026 14:38:00 +0000 /news/?p=91232 Two black earbuds: one with the casing removed exposing a computer chip and tiny camera.
91̽researchers developed a system called VueBuds that uses tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. Here, the altered headphones are shown with the camera inserted. Photo: Kim et al./CHI ‘26

91̽ researchers developed the first system that incorporates tiny cameras in off-the-shelf wireless earbuds to allow users to talk with an AI model about the scene in front of them. For instance, a user might turn to a Korean food package and say, “Hey Vue, translate this for me.” They’d then hear an AI voice say, “The visible text translates to ‘Cold Noodles’ in English.”

The prototype system called VueBuds takes low-resolution, black-and-white images, which it transmits over Bluetooth to a phone or other nearby device. A small artificial intelligence model on the device then answers questions about the images within around a second. For privacy, all of the processing happens on the device, a small light turns on when the system is recording, and users can immediately delete images.

The team will April 14 at the Association for Computing Machinery Conference on Human Factors in Computing Systems in Barcelona.

“We haven’t seen most people adopt smart glasses or VR headsets, in part because a lot of people don’t like wearing glasses, and they often come with , such as recording high-resolution video and processing it in the cloud,” said senior author , a 91̽professor in the Paul G. Allen School of Computer Science & Engineering. “But almost everyone wears earbuds already, so we wanted to see if we could put visual intelligence into tiny, low-power earbuds, and also address privacy concerns in the process.”

Cameras use far more power than the microphones already in earbuds, so using the same sort of high-res cameras as those in smart glasses wouldn’t work. Also, large amounts of information can’t stream continuously over Bluetooth, so the system can’t run continuous video.

The team found that using a low-power camera — roughly the size of a grain of rice — to shoot low-resolution, black-and-white still images limited battery drain and allowed for Bluetooth transmission while preserving performance.

There was also the matter of placement.

“One big question we had was: Will your face obscure the view too much? Can earbud cameras capture the user’s view of the world reliably?” said lead author , who completed this work as a 91̽doctoral student in the Allen School.

The team found that angling each camera 5-10 degrees outward provides a 98-108 degree field of view. While this creates a small blind spot when objects are held closer than 20 centimeters from the user, people rarely hold things that close to examine them — making it a non-issue for typical interactions.

Researchers also discovered that while the vision language model was largely able to make sense of the images from each earbud, having to process images from both earbuds slowed it down. So they had the system “stitch” the two images into one, identifying overlapping imagery and combining it. This allows the system to respond in one second — quick enough to feel like real-time for users — rather than the two seconds it takes with separate images.

The team then had 74 participants compare recorded outputs from VueBuds with outputs from Ray-Ban Meta Glasses in a series of tests. Despite VueBuds using low-resolution images with greater privacy controls and the Ray-Bans taking high-res images processed on the cloud, the two systems performed equivalently. Participants preferred VueBuds’ translations, while the Ray-Bans did better at counting objects.

Sixteen participants also wore VueBuds and tested the system’s ability to translate and answer basic questions about objects. VueBuds achieved 83-84% accuracy when translating or identifying objects and 93% when identifying the author and title of a book.

This study was designed to gauge the feasibility of integrating cameras in wireless earbuds. Since the system only takes grayscale images, it can’t answer questions that involve color in the scene.

The team wants to add color to the system — color cameras require more power — and to train specialized AI models for specific use cases, such as translation. 

“This study lets us glimpse what’s possible just using a general purpose language model and our wireless earbuds with cameras,” Kim said. “But we’d like to study the system more rigorously for applications like reading a book — for people who have low vision or are blind, for instance — or translating text for travelers.” 

Co-authors include , a 91̽master’s student in the Allen School, and , , , and , all 91̽students in electrical and computer engineering.

For more information, contact vuebuds@cs.washington.edu.

]]>
DopFone app can accurately track fetal heart rate using only a smartphone /news/2026/02/26/dopfone-fetal-heart-rate-app/ Thu, 26 Feb 2026 16:58:23 +0000 /news/?p=90704
DopFone uses an off-the-shelf smartphone’s existing speaker and microphone to accurately estimate fetal heart rate. The phone mimics a Doppler ultrasound, emitting a tone and listening for the subtle variations in its echo caused by fetal heart beats. A machine learning model then estimates the heart rate. Photo: Garg et al./Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Heart rate is an important sign of fetal health, yet few technologies exist to easily and inexpensively track fetal heart rates outside of doctors’ offices. This can create risks for pregnancies in low-resource regions where doctors are far away or inaccessible.

A team led by 91̽ researchers has created DopFone, a system that uses an off-the-shelf smartphone’s existing speaker and microphone to accurately estimate fetal heart rate. The phone mimics a Doppler ultrasound, emitting a tone and listening for the subtle variations in its echo caused by fetal heart beats. A machine learning model then estimates the heart rate. In a clinical test with 23 pregnant women, DopFone estimated heart rate with an average error of 2 beats per minute, or bpm. The accepted clinical range is within 8 bpm.

The team Dec. 2 in the Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.

“Eventually DopFone could let people test fetal heart rate regularly, rather than relying on the intermittent tests at a doctor’s office, or not getting tested at all,” said lead author , a 91̽doctoral student in the Paul G. Allen School of Computer Science & Engineering. “Patients might then send this data to doctors so that they can better judge patients’ health when they’re not in a clinic.”

Traditional Doppler ultrasounds, the clinical standard for fetal heart rate monitoring, work by sending high-frequency sound into a person’s body and tracking how the echo changes in frequency. They’re very accurate at measuring fetal heart rate but require costly equipment and a skilled technician to operate it.

To use DopFone, a user places the phone’s microphone against their abdomen for one minute. The phone emits a subaudible 18 kilohertz tone. The team chose this low frequency because — unlike a Doppler’s high frequencies, above 2,000 kilohertz —  it sits within the range smartphone microphones can record while still traveling well through tissue. As the tone is reflected through the user’s abdomen, the fetus’s heartbeat creates small shifts in the sound.

A machine learning model then estimates the heart rate using the audio and the patient’s demographic information

The team tested DopFone in 91̽Medicine’s maternal-fetal medicine division on 23 pregnant patients between 19 and 39 weeks of pregnancy. On average its readings were within 2.1 bpm of the medical Doppler ultrasound. Its accuracy was slightly diminished for patients with high body mass indexes, though those readings were still within normal limits. Because an irregular fetal heartbeat is often an emergency, DopFone was not tested on patients with irregularities.

Next, the team plans to gather more data outside a lab to better train the model. Eventually they want to deploy it as a publicly available app.

“This women’s health space is often overlooked,” Garg said. “So I want to focus on accessible alternatives that can be available to people in low resource areas, whether that’s here in the U.S. or in other countries. Because health belongs to everyone.”

Co-authors include , a 91̽graduate student in electrical and computer engineering; and , both OB/GYNs in 91̽Medicine’s  maternal-fetal medicine division; and , a 91̽assistant professor in the Allen School. , a 91̽professor in the Allen School and in electrical and computer engineering, and of the Georgia Institute of Technology, were senior authors. This research was funded by the 91̽Gift Fund.

For more information, contact Garg at pgarg70@uw.edu.

]]>
In a study, AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts /news/2026/02/04/in-a-study-ai-model-openscholar-synthesizes-scientific-research-and-cites-sources-as-accurately-as-human-experts/ Wed, 04 Feb 2026 16:02:30 +0000 /news/?p=90533 A screenshot of the OpenScholar demo.
91̽and Ai2 research team built OpenScholar, an open-source AI model designed specifically to synthesize current scientific research. In tests, OpenScholar cited sources as accurately as human experts, and 16 scientists preferred its response to those written by subject experts 51% of the time. Above is the user-interface for a free online demo of the model.

Keeping up with the latest research is vital for scientists, but given that are published every year, that can prove difficult. Artificial intelligence systems show promise for quickly synthesizing seas of information, but they still tend to make things up, or “hallucinate.” 

For instance, when a team led by researchers at the 91̽ and , or Ai2, studied a recent OpenAI model, , they found it fabricated 78-90% of its research citations. And general-purpose AI models like ChatGPT often can’t access papers that were published after their training data was collected.

So the 91̽and Ai2 research team built OpenScholar, an open-source AI model designed specifically to synthesize current scientific research. The team also created the first large, multi-domain for evaluating how well models can synthesize and cite scientific research. In tests, OpenScholar cited sources as accurately as human experts, and 16 scientists preferred its response to those written by subject experts 51% of the time.

The team Feb. 4 in Nature. The project’s are publicly available and free to use.

“After we started this work, we put the demo online and quickly, we got a lot of queries, far more than we’d expected,” said senior author , a 91̽associate professor in the Paul G. Allen School of Computer Science & Engineering and senior director at Ai2. “When we started looking through the responses we realized our colleagues and other scientists were actively using OpenScholar. It really speaks to the need for this sort of open-source, transparent system that can synthesize research.”

Try the

Researchers trained the model and then created a set of 45 million scientific papers for OpenScholar to pull from to ground its answers in established research. They coupled this with a technique called “,” which lets the model search for new sources, incorporate them and cite them after it’s been trained.

“Early on we experimented with using an AI model with Google’s search data, but we found it wasn’t very good on its own,” said lead author , a research scientist at Ai2 who completed this research as a 91̽doctoral student in the Allen School. “It might cite some research papers that weren’t the most relevant, or cite just one paper, or pull from a blog post randomly. We realized we needed to ground this in scientific papers. We then made the system flexible so that it could incorporate emerging research through results.” 

To test their system, the team created ScholarQABench, a benchmark against which to test systems on scientific search. They gathered 3,000 queries and 250 longform answers written by experts in computer science, physics, biomedicine and neuroscience.

“AI is getting better and better at real world tasks,” Hajishirzi said. “But the big question ultimately is whether we can trust that its answers are correct.”

The team compared OpenScholar against other state-of-the-art AI models, such as OpenAI’s GPT-4o and two models from Meta. ScholarQABench automatically evaluated AI models’ answers on metrics such as their accuracy, writing quality and relevance.

OpenScholar outperformed all the systems it was tested against. The team had 16 scientists review answers from the models and compare them with human-written responses. The scientists preferred OpenScholar answers to human answers 51% of the time, but when they combined OpenScholar citation methods and pipelines with GPT-4o (a much bigger model), the scientists preferred the AI written answers to human answers 70% of the time. They picked answers from GPT-4o on its own only 32% of the time.

“Scientists see so many papers coming out every day that it’s impossible to keep up,” Asai said. “But the existing AI systems weren’t designed for scientists’ specific needs. We’ve already seen a lot of scientists using OpenScholar and because it’s open-source, others are building on this research and already improving on our results. We’re working on a followup model, , which builds on OpenScholar’s findings and performs multi-step search and information gathering to produce more comprehensive responses.” 

Other co-authors include , , , all 91̽doctoral students in the Allen School; , a 91̽professor emeritus in the Allen School and general manager and chief scientist at Ai2; , a 91̽postdoc in the Allen School and postdoc at Ai2; , a 91̽professor in the Allen School; , a 91̽assistant professor in

the Allen School; Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, David Wadden, Matt Latzke, Jenna Sparks and Jena D. Hwang of Ai2; Wen-tau Yih of Meta; Minyang Tian, Shengyan Liu, Hao Tong and Bohao Wu of University of Illinois Urbana-Champaign; Pan Ji of University of North Carolina; Yanyu Xiong of Stanford University; and Graham Neubig of Carnegie Mellon University.

For more information, contact Asai at akaria@allenai.org and Hajishirzi at hannaneh@cs.washington.edu.

]]>
91̽researchers analyzed which anthologized writers and books get checked out the most from Seattle Public Library /news/2026/01/08/seattle-public-library-data-anthologized-writers/ Thu, 08 Jan 2026 17:04:04 +0000 /news/?p=90225
91̽researchers analyzed the checkout data from the last 20 years of the 93 authors included in the post-1945 volume of “The Norton Anthology of American Literature,” which is assigned in U.S. English classes more than nearly any other anthology. Photo:

Seattle Public Library, or SPL, is the only U.S. library system that makes its anonymized, granular checkout data public. Want to find out how many times people borrowed the e-book version of Toni Morrison’s “Beloved” in May 2018? That data is available.

The hitch is that the library’s data set contains nearly 50 million rows, and a single title can appear variously. Morrison’s “Beloved,” for instance, is listed as “Beloved,” “Beloved (unabridged),” “Beloved : a novel / by Toni Morrison” and so on.

To track trends in the catalogue over the last 20 years, 91̽ researchers analyzed the checkout data of the 93 authors included in the post-1945 volume of “The Norton Anthology of American Literature.” It’s assigned in U.S. English classes more than virtually any other anthology, so what’s thought of as the contemporary American — the books and writers we’ve deemed culturally important.

The team found that among these vaunted writers — including Morrison, Viet Thanh Nguyen, David Foster Wallace and Joan Didion — science fiction was particularly popular. Ursula K. Le Guin and Octavia E. Butler topped the list.

The team Nov. 21 in Computational Humanities Research 2025, and created .

Related:

  • looks at how checkouts correspond with book sales and other library circulation

“It’s kind of mind-boggling and ironic that in this age of abundant data, we have so little data about what people are reading,” said senior author , a 91̽assistant professor in the Information School. “, particularly for researchers, so I’ve been obsessed with SPL’s data for years now. But extracting insights from it is actually a really hard computational and bibliographic modeling problem.”

To organize the data, the team used computational methods, such as stripping away subtitles and standardizing punctuation. They also manually identified things like translations of a work.

“We worked with the Norton anthology in part because it’s a small enough scale for us to handle,” said lead author , a 91̽doctoral student in the Information School. “It allows us to have a ground truth to work off of. We can still put a human eye on things.” 

In all the team looked at 1,603 works by the 93 authors, which were checked out a total of 980,620 times since 2005.

A line graph shows checkouts of Ursula K. Le Guin increasing over two decades.
This graph follows how many times Ursula K. Le Guin’s books were borrowed since 2005. Photo: Gupta et al./Computational Humanities Research 2025

The 10 top authors were:

  1. Ursula K. Le Guin
  2. Octavia E. Butler
  3. Louise Erdrich
  4. N.K. Jemisin
  5. Toni Morrison
  6. Kurt Vonnegut
  7. George Saunders
  8. Philip K. Dick
  9. Sherman Alexie
  10. James Baldwin

The 10 top books were: 

  1. “Parable of the Sower” by Octavia E. Butler
  2. “Lincoln in the Bardo” by George Saunders
  3. “The Fifth Season” by N.K. Jemisin
  4. “The Sympathizer” by Viet Thanh Nguyen
  5. “Kindred” by Octavia E. Butler
  6. “Beloved” by Toni Morrison
  7. “The Left Hand of Darkness” by Ursula K. Le Guin
  8. “The Absolutely True Diary of a Part-Time Indian” by Sherman Alexie
  9. “The Year of Magical Thinking” by Joan Didion
  10. “The Sentence” by Louise Erdrich

Researchers noted several trends that may have driven checkouts. In general, books with genre and sci-fi elements were some of the most popular.

“I found the prevalence of sci-fi books and writers really interesting,” Gupta said. “These are recent additions to the anthology, since sci-fi and genre fiction haven’t always been seen as important literature. So while it’s a bit unsurprising, it’s also striking to see that despite comprising a small portion of the anthology, these are the authors people are actually reading the most.”

News events also drove spikes in readership, such as film adaptations of James Baldwin’s “If Beale Street Could Talk” and Don DeLillo’s “White Noise,” or the deaths of authors such as Didion, Wallace, Morrison and Philip Roth.

The top book, “Parable of the Sower,” saw a huge spike in readership in 2024 — the year the futuristic novel is set, and the year SPL selected the novel for its program.

“We’ve deemed these canonical authors important enough to continue reading, to continue teaching, to continue studying and talking about, so it’s fascinating to see who we’re actually reading and when,” Walsh said. “I find it very beautiful that after years of these big debates about diversifying the canon, the works that people are turning to the most are by women and Black and Native writers, who previously were not even included in these anthologies.”

Co-authors include Daniella Maor, Karalee Harris, Emily Backstrom and Hongyuan Dong, all students at the UW. This research was supported in part by the .

For more information, contact Walsh at melwalsh@uw.edu and Gupta at ngupta1@uw.edu.

]]>
Video: Drivers struggle to multitask when using dashboard touch screens, study finds /news/2025/12/16/video-drivers-struggle-to-multitask-when-using-dashboard-touch-screens-study-finds/ Tue, 16 Dec 2025 17:00:09 +0000 /news/?p=90099

Once the domain of buttons and knobs, car dashboards are increasingly home to large touch screens. While that makes following a mapping app easier, it also means drivers can’t feel their way to a control; they have to look. But how does that visual component affect driving?

New research from the 91̽ and Toyota Research Institute, or TRI, explores how drivers balance driving and using touch screens while distracted. In the study, participants drove in a vehicle simulator, interacted with a touch screen and completed memory tests that mimic the mental effort demanded by traffic conditions and other distractions. The team found that when people multitasked, their driving and touch screen use both suffered. The car drifted more in the lane while people used touch screens, and their speed and accuracy with the screen declined when driving. The effects increased further when they added the memory task.

These results could help auto manufacturers design safer, more responsive touch screens and in-car interfaces.

The team Sept. 30 at the ACM Symposium on User Interface Software and Technology in Busan, Korea.

“We all know ,” said co-senior author , a 91̽professor in the Paul G. Allen School of Computer Science & Engineering. “But what about the car’s touch screen? We wanted to understand that interaction so we can design interfaces specifically for drivers.”

As the study’s 16 participants drove the simulator, sensors tracked their gaze, finger movements, pupil diameter and electrodermal activity. The last two are common ways to measure mental effort, or “cognitive load.” For instance, pupils tend to grow when people are concentrating.

Related:

  • Story from

While driving, participants had to touch specific targets on a 12-inch touch screen, similar to how they would interact with apps and widgets. They did this while completing three levels of an “N-back task,” a memory test in which the participants hear a series of numbers, 2.5 seconds apart, and have to repeat specific digits.

The participants’ performance changed significantly under different conditions:

  • When interacting with the touch screen, participants drifted side to side in their lane 42% more often. Increasing cognitive load had no effect on the results.
  • Touch screen accuracy and speed decreased 58% when driving, then another 17% under high cognitive load.
  • Each glance at the touchscreen was 26.3% shorter under high cognitive load.
  • A “hand-before-eye” phenomenon, in which drivers’ reached for a control before looking at it, increased from 63% to 71% as memory tasks were introduced.

The team also found that increasing the size of the target areas participants were trying to touch did not improve their performance.

“If people struggle with accuracy on a screen, usually you want to make bigger buttons,” said , a 91̽doctoral student in the Allen School. “But in this case, since people move their hand to the screen before touching, the thing that takes time is the visual search.”

Based on these findings, the researchers suggest future in-car touch screen systems might use simple sensors in the car — eye tracking, or touch sensors on the steering wheel — to monitor drivers’ attention and cognitive load. Based on these readings, the car’s system might adjust the touch screen’s interface to make important controls more prominent and safer to access.

“Touch screens are widespread today in automobile dashboards, so it is vital to understand how interacting with touch screens affects drivers and driving,” said co-senior author , a 91̽professor in the Information School. “Our research is some of the first that scientifically examines this issue, suggesting ways for making these interfaces safer and more effective.”

, a 91̽doctoral student in the Information School, is co-lead author. Other co-authors include , , and of TRI. This research was funded in part by TRI.

For more information, contact Wobbrock at wobbrock@uw.edu and Fogarty at jfogarty@cs.washington.edu.

]]>
AI can pick up cultural values by mimicking how kids learn /news/2025/12/11/ai-training-cultural-values/ Thu, 11 Dec 2025 17:04:44 +0000 /news/?p=90064 A video game shows two kitchens of different sizes.
In the Overcooked video game, players work to cook and deliver as much onion soup as possible. In the study’s version of the game, one player can give onions to help the other who has further to travel to make the soup. The research team wanted to find out if AI systems could learn altruism by watching different cultural groups play the game. Photo:

Artificial intelligence systems absorb values from their training data. The trouble is that values differ across cultures. So an AI system trained on data from the entire internet won’t work equally well for people from different cultures.

But a new 91̽ study suggests that AI could learn cultural values by observing human behavior. Researchers had AI systems observe people from two cultural groups playing a video game. On average, participants in one group behaved more altruistically. The AI assigned to each group learned that group’s degree of altruism, and was able to apply that value to a novel scenario beyond the one they were trained on.

The team Dec. 9 in PLOS One.

“We shouldn’t hard code a universal set of values into AI systems, because many cultures have their own values,” said senior author , a 91̽professor in the Paul G. Allen School of Computer Science & Engineering and co-director of the Center for Neurotechnology. “So we wanted to find out if an AI system can learn values the way children do, by observing people in their culture and absorbing their values.”

As inspiration, the team looked to showing that 19-month-old children raised in Latino and Asian households were more than those from other cultures.

In the AI study, the team recruited 190 adults who identified as white and 110 who identified as Latino. Each group was assigned an AI agent, a system that can function autonomously.

These agents were trained with a method called inverse reinforcement learning, or IRL. In the more common AI training method, reinforcement learning, or RL, a system is given a goal and gets rewarded based on how well it works toward that goal. In IRL, the AI system observes the behavior of a human or another AI agent, and infers the goal and underlying rewards. So a robot trained to play tennis with RL would be rewarded when it scores points, while a robot trained with IRL would watch professionals playing tennis and learn to emulate them by inferring goals such as scoring points.

This IRL approach more closely aligns with how humans develop.

“Parents don’t simply train children to do a specific task over and over. Rather, they model or act in the general way they want their children to act. For example, they model sharing and caring towards others,” said co-author , a 91̽professor of psychology and co-director of Institute for Learning & Brain Sciences (I-LABS). “Kids learn almost by osmosis how people act in a community or culture. The human values they learn are more ‘caught’ than ‘taught.’”

In the study, the AI agents were given the data of the participants playing a modified version of the video game Overcooked, in which players work to cook and deliver as much onion soup as possible. Players could see into another kitchen where a second player had to walk further to accomplish the same tasks, putting them at an obvious disadvantage. Participants didn’t know that the second player was a bot programmed to ask the human players for help. Participants could choose to give away onions to help the bot but at the personal cost of delivering less soup.

Researchers found that overall the people in the Latino group chose to help more than those in the white group, and the AI agents learned the altruistic values of the group they were trained on. When playing the game, the agent trained on Latino data gave away more onions than the other agent.

To see if the AI agents had learned a general set of values for altruism, the team conducted a second experiment. In a separate scenario, the agents had to decide whether to donate a portion of their money to someone in need. Again, the agents trained on Latino data from Overcooked were more altruistic.

“We think that our proof-of-concept demonstrations would scale as you increase the amount and variety of culture-specific data you feed to the AI agent. Using such an approach, an AI company could potentially fine-tune their model to learn a specific culture’s values before deploying their AI system in that culture,” Rao said.

Additional research is needed to know how this type of IRL training would perform in real-world scenarios, with more cultural groups, competing sets of values, and more complicated problems.

“Creating culturally attuned AI is an essential question for society,” Meltzoff said. “How do we create systems that can take the perspectives of others into account and become civic minded?”

, a 91̽research engineer in the Allen School, and , a software engineer at Microsoft who completed this research as a 91̽student, were co-lead authors. Other co-authors include , a scientist at the Allen Institute who completed this research as a 91̽doctoral student; , an assistant professor at San Diego State University, who completed this research as a post-doctoral scholar at UW; and , a professor in the Allen School and director of the at UW.

For more information, contact Rao at rao@cs.washington.edu.

]]>
People mirror AI systems’ hiring biases, study finds /news/2025/11/10/people-mirror-ai-systems-hiring-biases-study-finds/ Mon, 10 Nov 2025 15:46:33 +0000 /news/?p=89402 A person's hands type on a laptop.
In a new 91̽ study, 528 people worked with simulated LLMs to pick candidates for 16 different jobs, from computer systems analyst to nurse practitioner to housekeeper. The researchers simulated different levels of racial biases in LLM recommendations for resumes from equally qualified white, Black, Hispanic and Asian men. Photo: Delmaine Donson/iStock

An organization drafts a job listing with artificial intelligence. Droves of with chatbots. Another AI system sifts through those applications, passing recommendations to hiring managers. Perhaps AI avatars conduct screening interviews. This is increasingly the state of hiring, as people seek to streamline the stressful, tedious process with AI.

Yet research is finding that hiring bias — against people with disabilities, or certain races and genders — permeates large language models, or LLMs, such as ChatGPT and Gemini. We know less, though, about how biased LLM recommendations influence the people making hiring decisions.

In a new 91̽ study, 528 people worked with simulated LLMs to pick candidates for 16 different jobs, from computer systems analyst to nurse practitioner to housekeeper. The researchers simulated different levels of racial biases in LLM recommendations for resumes from equally qualified white, Black, Hispanic and Asian men.

When picking candidates without AI or with neutral AI, participants picked white and non-white applicants at equal rates. But when they worked with a moderately biased AI, if the AI preferred non-white candidates, participants did too. If it preferred white candidates, participants did too. In cases of severe bias, people made only slightly less biased decisions than the recommendations.

The team Oct. 22 at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in Madrid.

“In one survey, 80% of organizations using AI hiring tools said they don’t reject applicants without human review,” said lead author , a 91̽doctoral student in the Information School. “So this human-AI interaction is the dominant model right now. Our goal was to take a critical look at this model and see how human reviewers’ decisions are being affected. Our findings were stark: Unless bias is obvious, people were perfectly willing to accept the AI’s biases.”

Participants were given a job description and the names and resumes of five candidates: two white men; two men who were either Asian, Black or Hispanic; and one candidate whose resume lacked qualifications for the job, to obscure the purpose of the study. An example from the study is shown here. Photo: Wilson et al./AIES ‘25

The team recruited 528 online participants from the U.S. through surveying platform , who were then asked to screen job applicants. They were given a job description and the names and resumes of five candidates: two white men and two men who were either Asian, Black or Hispanic. These four were equally qualified. To obscure the purpose of the study, the final candidate was of a race not being compared and lacked qualifications for the job. Candidates’ names implied their races — for example, Gary O’Brien for a white candidate. Affinity groups, such as Asian Student Union Treasurer, also signaled race.

In four trials, the participants picked three of the five candidates to interview. In the first trial, the AI provided no recommendation. In the next trials, the AI recommendations were neutral (one candidate of each race), severely biased (candidates from only one race), or moderately biased, meaning candidates were recommended at rates similar to rates of bias in real AI models. The team derived rates of moderate bias using the same methods as in their 2024 study that looked at bias in three common AI systems.

Rather than having participants interact directly with the AI system, the team simulated the AI interactions so they could hew to rates of bias from their large-scale study. Researchers also used AI generated resumes, rather than real resumes, which they validated. This allowed greater control, and AI-written resumes are increasingly common in hiring.

“Getting access to real-world hiring data is almost impossible, given the sensitivity and privacy concerns,” said senior author , a 91̽associate professor in the Information School. “But this lab experiment allowed us to carefully control the study and learn new things about bias in human-AI interaction.”

Without suggestions, participants’ choices exhibited little bias. But when provided with recommendations, participants mirrored the AI. In the case of severe bias, choices followed the AI picks around 90% of the time, rather than nearly all the time, indicating that even if people are able to recognize AI bias, that awareness isn’t strong enough to negate it.

“There is a bright side here,” Wilson said. “If we can tune these models appropriately, then it’s more likely that people are going to make unbiased decisions themselves. Our work highlights a few possible paths forward.”

In the study, bias dropped 13% when participants began with an , intended to detect subconscious bias. So companies including such tests in hiring trainings may mitigate biases. Educating people about AI can also improve awareness of its limitations.

“People have agency, and that has huge impact and consequences, and we shouldn’t lose our critical thinking abilities when interacting with AI,” Caliskan said. “But I don’t want to place all the responsibility on people using AI. The scientists building these systems know the risks and need to work to reduce systems’ biases. And we need policy, obviously, so that models can be aligned with societal and organizational values.”

, a 91̽doctoral student in the Information School, and , a postdoctoral scholar at Indiana University, are also co-authors on this paper. This research was funded by The U.S. National Institute of Standards and Technology.

For more information, contact Wilson at kywi@uw.edu and Caliskan at aylin@uw.edu.

]]>
Q&A: How video games can lead people to more meaningful lives /news/2025/09/30/qa-how-video-games-can-lead-people-to-more-meaningful-lives/ Tue, 30 Sep 2025 15:30:05 +0000 /news/?p=89451 Gamer using joystick controller
91̽researchers discuss their study which surveyed 166 gamers about how video games sparked meaningful changes in their lives. Photo:

Even though video games have grown as an artistic medium , they are still often written off as mindless entertainment. Research is increasingly exploring meaningful gaming experiences. Less studied, though, are the ways such experiences can alter people’s  lives long term.

In a new study, 91̽ researchers surveyed gamers about video games’ effects. Of 166 respondents researchers asked about meaningful experiences, 78% said such experiences had altered their lives. Researchers then pulled recurring themes from the responses — such as the power of  rich storytelling — so that developers, gamers and even parents or teachers might focus on those elements.

The team will Oct. 14 at the Annual Symposium on Computer-Human Interaction in Play in Pittsburgh.

To learn more about the paper, 91̽News spoke with lead author , a 91̽doctoral student in human centered design and engineering; co-senior author , a 91̽professor and chair in human centered design and engineering; and co-senior author , a 91̽professor in the Information School.

What are the most significant findings in the study?

Nisha Devasia: We highlighted three conclusions drawn from modeling the data. The first is that playing games during stressful times was strongly correlated with positive outcomes for physical and mental health. For example, during COVID, people played  games they felt strongly improved their mental health, such as Stardew Valley. Others mentioned that games that required movement, or games that had characters with interesting physical abilities, inspired them to get outside or try new sports. Many participants also said that they gained a lot of insight from the game narrative. Story-based games often tell a sort of hero’s journey, for instance. People reported that the insight they gained from those stories correlated to their own self-reflection and identity building.

Finally, most people had these meaningful experiences in very early adulthood or younger, when they’re still trying to figure out who they are and what they want to be in the world. Playing as a character and seeing your choices change the course of events is pretty unique to games, compared with other narrative media like novels or movies.

Do any individual stories really stand out to you from the survey you took?

ND: All the stories about Final Fantasy VII, because that’s the game that I love. I’m actually sitting in my childhood bedroom right now and the wall behind me is covered in Final Fantasy VII posters. The quote we used in the title also really resonated with me: “I would not be this version of myself today without these experiences.” I definitely cannot imagine what I would be doing in my life if I had not played Final Fantasy VII when I did.

People also said things like, “This helped me build the skills that ended up being my career. I learned how to program because I wanted to make games.” I worked in the gaming industry and can verify that’s true for many people in the industry.

How should these findings fit into how we view games as a society?

Julie Kientz: People have a tendency to treat technology as a monolith, as if video games are either good or bad, but there’s so much more nuance. The design matters. This study hopefully helps us untangle the positive elements. Certainly, there are bad elements — toxicity and addictiveness, for example. But we also see opportunities for growth and connection. Some people in the study met their spouses through games.

Jin Ha Lee: What Nisha studies is essentially what I live. I’m a gamer, and I have definitely started playing certain games with my two children specifically because I wanted to have more conversations with them. When my daughter plays games with interesting stories, we have the opportunity to talk about our lives as we analyze the story. What were these people thinking? Why did they make certain decisions? 

As researchers, we develop games for learning, for instance, for teaching people about misinformation or AI, or promote digital civic engagement, because we want to foster meaningful experiences. But a lot of the existing research just focuses on the short-term effects of games. This study really helps us understand what actually caused a game to make a difference in someone’s life.

What societal changes could we make in our approach to gaming?

JK: Because people have a tendency to oversimplify things, some of the proposed solutions can be counterproductive. For instance, limiting kids’ screen time can actually interfere with positive experiences, especially if someone is immersed in the storyline and identifies with the characters. If 30 minutes into a game, a kid’s Nintendo Switch turns off because of parental controls, that might hinder the ability to have a positive experience. If we aren’t using these tools consciously, it might actually lead to kids playing more casual, junk games, because those can be played in 30 minutes.

ND: You see this with discourse around game addiction, too. Sometimes excessive gaming is because of dark patterns in a game’s design. But it is often a symptom of someone going through something difficult in their life, and the game happens to be a way to cope. As our study shows, there’s the potential for growth in that coping.

JHL: There’s also a place for games and media that we consider “bad.” You might play a game that’s so horrible that you make a meme out of it, and the jokes you share become a way to build community. Online communities can grow into offline events and friendships. But that isn’t necessarily obvious if you just view gaming as something you need to protect your children from.

What technological changes might accentuate the meaningful effects of games?

JHL: Games are naturally interactive and complex, so there’s a lot of opportunity for critical engagement beyond just the gameplay. There’s music, there’s art, there’s storytelling. All of these offer space for meaningful interaction. Designers can skillfully incorporate these elements to prompt reflection, evoke emotions, or challenge players’ perspectives.

ND: We’re calling our next study Video Game Book Club. Right now I’m building a tool to allow people to annotate their gameplay as if they were writing in the margins of a book. While you play, a little pop-up lets you make a note. At the end, an interface pops up showing your gameplay stream and all the notes you made, which should allow them to reflect on what they were thinking as they were playing.

We’re also working on a reflection chatbot. Every time after you play a session that’s 30 minutes to an hour long, you’ll interact with this bot that prompts you to think critically about the experience, much like we’re taught to relate to literature. What was really memorable? How is this connected to your life? 

Co-authors include , a 91̽doctoral student in human centered design and engineering, and , a 91̽doctoral student in the Information School. This research was funded by the .

For more information, contact Devasia at ndevasia@uw.edu, Kientz at jkientz@uw.edu and Lee at jinhalee@uw.edu.

]]>
A simple intervention significantly improved patent outcomes for women inventors /news/2025/09/29/women-inventors-patent-outcomes-improved/ Mon, 29 Sep 2025 16:00:52 +0000 /news/?p=89415 a pen sits on a patent application
Research by the 91̽ and the USPTO found that some simple interventions increased the probability that female inventors would get patents by 12%. For first-time applicants, that probability increased to 17%. Photo: iStock

While innovation is core to American identity, women inventors were named on only 13% of 2019 U.S. patents. In part, that’s because .

Research by the 91̽ and the United States Patent and Trademark Office, or USPTO, found that some simple interventions increased the probability that female inventors would get patents by 12%. For first-time applicants, that probability increased to 17%. The study, the first randomized controlled trial of inventors at the USPTO, followed inventors who applied “pro se,” meaning without the help of a lawyer. Researchers randomly assigned some senior patent examiners to provide extra help and encouragement navigating the complicated examination process.

The paper was published in the of the American Economic Journal: Economic Policy.

The study began in 2014, when USPTO created a unit to help pro se inventors through the patent process. The office selected 15 senior patent examiners, who received 20 hours of training on strategies to better assist pro se inventors. In the span of a year, 2,273 applications were divided between the treatment and control arms. Of those applications, 16% had more than half women inventors.

In the treatment arm, examiners used more encouraging language and gave more detailed responses in their first written decisions. They also prompted the applicants to call for an interview about the decision. Interviews increased 25% for both genders, but majority-women teams were 8% more likely to work out specific changes in those interviews.

“This was a very effective, fairly low-cost program,” said author , a 91̽assistant professor in the Information School. “There’s this ideal of the garage inventor tinkering with something, coming up with an idea to start a company. That group of people usually doesn’t have access to lawyers, so they apply as individuals. This intervention helped more people find success.” 

A full list of co-authors is included with the .

For more information, contact Teodorescu at miketeod@uw.edu.

]]>
Q&A: 91̽professor’s book explores how ‘technology is never culturally neutral’ /news/2025/09/19/digital-culture-shock-katharina-reinecke/ Fri, 19 Sep 2025 15:36:35 +0000 /news/?p=89273 The cover of the book Digital Culture Shock.
In her new book, Katharina Reinecke explores how “digital culture shock” manifests in the world, in ways innocuous and sometimes harmful. Photo: Princeton University Press

” describes the overwhelm people can feel when suddenly immersed in a new culture. The flurry of unfamiliar values, aesthetics and language can disorient, discomfit and alienate. In her new book, “,” argues that technology can similarly affect people. Reinecke, a 91̽ professor in the Paul G. Allen School of Computer Science & Engineering, uses the phrase to “describe the experience and influence of actively or passively using technology that is not in line with one’s cultural practices or norms.” 

The book explores how the self-driving cars trained on U.S. streets would likely struggle to translate to Cairo, with its drastically different road norms. It looks at how , with its complex search interface, can overwhelm Americans used to Google’s minimalist design. And Reinecke digs into how so much technology emanating from specific regions, such as the Bay Area, can lead to forms of cultural imperialism.

91̽News spoke with Reinecke about the book and how digital culture shock manifests in the world, in ways innocuous and sometimes harmful.

What was the spark that led to this book? 

Katharina Reinecke: Maybe it was less of a spark and more of an embarrassment, but around 20 years ago I worked in Rwanda on developing an e-learning application for agricultural advisors in the country. When I presented the software I’d developed to some of the advisors, they very politely told me that they didn’t like the way it looked and didn’t find it intuitive to use. I realized that my cultural background had influenced all the little design decisions made while developing it: whether the interface should be colorful or simply gray and white, which I thought most people would prefer; whether users should be guided through the application or mostly explore on their own. The answer to any of these questions depends on a user’s upbringing, education, norms and values.

Once I realized that technology is never culturally neutral, I set out to earn a doctorate on this topic and the rest is history. Over the years, I kept collecting similar technology blunders. It turns out, like me, most people have no idea that their culture affects how they use technology and how they develop it. It’s just not something we usually think about or get taught.

Learn more:

Is there an example of digital culture shock that stands out to you the most or is particularly illustrative? Why?

KR: AI is all over the news these days, so let me start there. When ChatGPT and other generative AI tools came out, I think it really illustrated how its developers had made several design decisions that make these tools work well for some, but not all people. They are trained on mostly English data sources on the web, so early language models told us things like “I love my country. I am proud to be an American” or “I grew up in a Christian home and attended church every week.” Obviously this would make many people aware that the AI is different from themselves.

We found that the way that these language models speak and what values they convey is only aligned with a tiny portion of the world’s population while others can experience these interactions as a form of digital culture shock. And this is true for any AI application out there from text-to-image models that generate pictures of churches when asked for houses of worship (as if churches are the only reasonable response) to self-driving cars trained in the U.S., which would likely not succeed in places where tuk-tuks and donkey carts share the road.

You discuss how much of the study of technology is conducted by and with people who are WEIRD, or Western, Educated, Industrial, Rich and Democratic. What are risks of a homogenous digital culture that can emerge from this?

KR: The biggest risk is that technology will continue to be designed in ways that work for people most similar to those in the largest technology hubs, but that it is less usable, intuitive, trustworthy and welcoming to the rest of us. This risk has ethical consequences because technology should be equally usable and useful for all, especially given companies’ enormous profits. There are also several examples in my book that clearly show technology products can struggle to gain market share in cultures it was not designed for, so ignoring this is also risky for companies.

As I discuss in the book, digital technology has been called out as a form of cultural imperialism because it embeds values and norms that are frequently misaligned with those of its users. This would be less of a problem if technology were designed in various technology hubs around the world, representing a diversity of cultures and values. But it is not. Most of the technology people use, no matter where in the world they are, was designed in the U.S., or it was influenced by user interface norms and frameworks developed in the U.S. So we’ve gotten ourselves into a situation where technology is slowly homogenizing and where people can best use it if they think and feel like its developers.

You finish the book with 10 misassumptions about technology and culture. What’s the single greatest, or most consequential, misassumption?

KR: To me, it is that people tend to think that one size fits all. They design technology and expect it to work for everyone, which is obviously not true.

For example, the Western obsession with productivity and efficiency often comes at the expense of interpersonal interactions. So many technology products are hyperfocused on making our days more efficient. There’s an app for any of our “problems,” and all of them try to somehow get us to function better, faster and more productively. But this laser-focus on streamlining misses the point that in many cultures, productivity works differently. In many East Asian cultures, for example, it takes time to build relationships before people will trust another person’s information — or that given by AI. So we need to get rid of the misassumption that technology design can be universal. My job would certainly be so much easier if people would stop believing this!  

For more information, contact Reinecke at reinecke@cs.washington.edu.

]]>