Su-In Lee – 91̽��News

Q&A: Transparency in medical AI systems is vital, 91̽��researchers say

Stefan Milne — Wed, 10 Sep 2025 15:39:40 +0000

In a new paper, 91̽�� researchers argue that a key standard for deploying medical AI is transparency — that is, using various methods to clarify how a medical AI system arrives at its diagnoses and outputs. Photo:

While debate rumbles about how generative artificial intelligence will change jobs, AI is already altering health care. AI systems are being used for everything from drug discovery to diagnostic tasks in radiology and clinical note-taking. A found that most are optimistic about AI’s potential to make health care more efficient and accurate, and nearly half of respondents have used AI tools for work.

Yet AI remains plagued with bugs, hallucinations, privacy concerns and other ethical quandaries, so deploying it for sensitive and consequential work comes with major risks. In a published Sept. 9 in Nature Reviews Bioengineering, 91̽�� researchers argue that a key standard for deploying medical AI is transparency — that is, using various methods to clarify how a medical AI system arrives at its diagnoses and outputs.

91̽��News spoke with the paper’s three authors about what transparency means for medical AI: co-lead authors and , both 91̽��doctoral students in the Paul G. Allen School of Computer Science & Engineering, and senior author , a professor in the Allen School.

What makes discussions of ethics in medical AI distinct from the broader discussions around AI ethics?

Chanwoo Kim: The biases built into AI systems and the risk of incorrect outputs are critical problems, especially in medicine, because they can directly impact people’s health and even determine life-altering outcomes.

The foundation for addressing those concerns is transparency: being open about the data, training, and testing that went into building a model. Knowing if an AI model is biased starts with understanding the data it was trained on. And the insights gained from such transparency can illuminate sources of bias and pathways for systematically mitigating these risks.

Su-In Lee: A is a good example. During the height of the COVID-19 pandemic, there was a surge of AI models that took chest X-rays and then predict whether the patient has COVID-19 or not. In our study, we showed that hundreds of these models were wrong: They were claiming accuracy close to 100% or 99% within some data sets, but in the external hospital data sets, this accuracy went down sharply. This indicates that AI models fail to generalize in real-world clinical settings. We used a technique that revealed that models relied on shortcuts: In the corners of the X-ray images, there are sometimes different kinds of text marks. We showed that the models were using these marks, which led the models to inaccurate results. Ideally, we’d want the models to look at the X-ray images themselves.

Your paper brings up “Explainable AI” as a route to transparency. Can you describe what that is?

SL: Explainable AI as a field started about a decade ago, when people were trying to interpret the outputs from the new generation of complex, “black box” machine learning models.

Here’s an example: Imagine that a bank customer wants to know if they can get a loan. The bank will then use lots of data about that person, including age and occupation and credit score and so on. They’ll feed that data to a model, which will make a prediction about whether this person is going to pay off the loan. A “black box” model would let you see only the result. But if this bank’s model lets you see the factors that led to its decision, you can better understand the reasoning process. That’s the core idea of Explainable AI: to help people better understand AI’s process.

There are a variety of methods, which we explain in our review paper. What I described in the bank example is called a “feature attribution” method. It’s attributing its output back to the input features.

How can regulation help with some of the risks of medical AI?

CK: In the United States, the FDA regulates medical AI under the Software as a Medical Device, or SaMD, framework. Recently, regulators have focused on coming up with a framework to enforce transparency. This includes making clear what AI is designed to do — stating specific use cases for systems and the standards for accuracy and limitations in real clinical settings, which are dependent on knowing how a model works. Also, medical AI is used in clinical settings, where things change dynamically and AI performance can fluctuate. So recent regulations are also trying to ensure that medical AI models are monitored continuously during the deployment time.

Soham Gadgil: New medical devices or drugs go through rigorous testing and clinical trials to be FDA approved. Having regulations for AI systems to undergo similarly rigorous testing and standards is important. Our lab has shown these models, even those that might seem accurate in tests, don’t always generalize in the real world.

In my opinion, many of the organizations developing these models do not have incentives to focus on transparency. Right now, the paradigm is that if your model performs better on certain — these sets of specific, standardized, public tests that AI organizations use to compare or rank their models — then it’s good enough to use, and it will probably get good adoption. However, this paradigm is incomplete, since these models can still hallucinate and generate false information. Regulation can help incentivize focusing on transparency along with model performance.

What role do you see clinicians playing in the adoption of AI transparency?

CK: Clinicians are critical in achieving transparency in medical AI. If a clinician uses an AI model to help with a diagnosis or treatment, then they are responsible for explaining the rationale behind a model’s predictions because they are ultimately responsible for the patient’s health. So clinicians need to be familiar with AI models’ techniques and even basic Explainable AI techniques, so that they can understand how the AI models work — not perfectly, but to the extent that they can explain the mechanism to patients.

SG: We collaborate with clinicians for most of our lab’s biomedical research projects. They give us insight on what we should be trying to explain. They tell us when Explainable AI solutions are correct, whether they’re applicable in health care, and ultimately whether these explanations will be useful for patients and clinicians.

What do you want the public to know about AI transparency?

SL: We should not just blindly trust what AI is doing. Chatbots hallucinate sometimes, and medical AI models make mistakes. Last year, , we audited five dermatology AI systems that you can easily get through an app store. When you see something strange on your skin, you take a picture, and the apps tell you whether that’s melanoma cancer or not. Our work showed that the results were frequently not accurate, much like the COVID-19 AI systems. We used a new type of Explainable AI technique to show why these systems failed in certain ways — what’s behind these mistakes.

SG: The first step toward using AI critically can be simple. For example, if someone uses a generative model to get preliminary medical information for some minor ailment, they could just ask the model itself to give an explanation. While the explanation might sound plausible, it should not be taken at face value. If the explanation points to sources, the user should verify that those sources are trustworthy and confirm that the information is accurate. For anything potentially consequential, clinicians need to be involved. You should not be asking ChatGPT whether you’re having a heart attack.

For more information, contact Lee at suinlee@cs.washington.edu.

Faculty/staff honors: Rising Star Award for DEI, honors for ornithological work, and more

91̽��News staff — Mon, 22 Apr 2024 20:33:41 +0000

Recent recognition for the 91̽�� includes a Rising Star Award, honors for distinguished ornithological work and a Gold Medal Award for Impact in Psychology.

Karen Thomas-Brown receives Rising Star Award

, 91̽��associate dean of diversity, equity & inclusion (DEI) for the College of Engineering, was given the in March by the National Association of Diversity Officers in Higher Education.

NADOHE’s Inclusive Excellence Awards recognize and honor achievements and contributions to guide higher education toward inclusivity and institutional transformation through research, leadership or service.

“This award is a significant acknowledgment that the body of work we pursue in the Office of Inclusive Excellence is on point as it informs the policies and practices of the college as a whole and is relevant to research,” Thomas-Brown said.

The Rising Star honoree is a NADOHE member who has been a chief or senior diversity officer for at least three years but no more than 10 years. A nomination statement details the person’s contributions to advance the understanding of DEI in higher education.

Thomas-Brown leads the College of Engineering’s efforts to be an accessible, welcoming and inclusive community. The award recognizes her contributions to advancing DEI initiatives, including developing best practices and guidelines and working to implement programs that increase participation of underserved groups.

Thomas-Brown holds a doctorate in geography from the University of West Indies and certificates in DEI, change management and leadership from Cornell University.

Professor of biology honored for ‘distinguished ornithological work’

, 91̽��professor of biology, received the British Ornithological Union’s during the Pacific Seabird Group’s annual conference banquet in February. BOU Council awards honor an individual’s distinguished ornithological work.

“To have the British honor me is high praise,” Boersma said. “I just hope we can reduce the impact of people on the natural world.”

Boersma was selected for excellence in scientific research, practical conservation, scientific monitoring and dissemination of science for public awareness. The committee particularly noted her devotion to documenting varying aspects of penguins’ lives and her contribution to understanding the conservation of all species.

Boersma directs the Center for Ecosystem Sentinels and is a member of the International Union for the Conservation of Nature SSC Penguin Specialist Group. As a scientific fellow for the Wildlife Conservation Society, she also leads research on Magellanic Penguins.

Affiliate professor receives Gold Medal Award

, 91̽��affiliate professor of psychology and gender, women & sexuality studies, received a from The American Psychological Foundation (APF). The award recognizes work that is impactful, innovative and transformational.

Freyd is known as a pioneer in the fields of trauma psychology and institutional courage. An activist in the realm of sexual violence, Freyd is also a professor emeritus of psychology at the University of Oregon and the founder and president of the Center for Institutional Courage. Her work has influenced approaches, policy frameworks, legal considerations and social attitudes.

“I am grateful for this award,” Freyd said in an APF release. “I am also hopeful that this acknowledgement will help in our efforts to investigate and prevent betrayal trauma and institutional betrayal while discovering how to nurture institutional courage.”

91̽��study named finalist for Cozzarelli Prize

A study from the 91̽��was named a finalist for the 2023 Proceedings of the National Academy of Sciences , which “acknowledges papers that reflect scientific excellence and originality.”

The paper, published in Proceedings of the National Academy of Sciences, was written by lead author , assistant professor at Utah State and former 91̽��postdoctoral researcher in the Abrahms Lab; senior author , assistant professor of biology; , professor of biology; and , research scientists/engineer of biology, using long-term data collected by the Center for Ecosystem Sentinels.

The paper focuses on how climate change will reshape ecosystems worldwide through short-term, extreme events and long-term changes. Ecologists call the short-term events “pulses” and the long-term changes “presses.” The study shows how different presses and pulses impacted Magellanic penguins — a migratory marine predator — over nearly four decades at their historically largest breeding site in Punta Tombo, Argentina.

“For conservation to be most effective, we need to know where, when and how to apply our limited resources,” Abrahms told 91̽��News last year. “Information generated by this study tells us which climate effects we need to worry about and which ones we don’t — and therefore can help us focus on measures that we know will have a positive impact.”

Su-In Lee receives Ho-Am Prize in Engineering

, 91̽��professor in the Paul G. Allen School of Computer Science & Engineering, was selected as the 2024 Samsung Ho-Am Prize Laureate in Engineering for her pioneering contributions to the field of explainable artificial intelligence.

Established in 1990, the honors people of Korean heritage who have contributed to academics, the arts and social development, or who have furthered the welfare of humanity in their respective field.

Lee is the first woman to receive the engineering prize.

Lee pioneered the innovative SHAP framework, revolutionizing the ability to interpret the results of machine learning models, along with subsequent algorithms. Her extensive contributions span foundational AI, computational molecular biology and clinical medicine.

Through her advancements in explainable AI technology, Lee has played a pivotal role in the development of clinical AI systems capable of predicting and elucidating various diagnoses and outcomes. Furthermore, her work has led to significant AI-driven discoveries aimed at enhancing our understanding of the origins and treatment of complex disease, such as cancer and Alzheimer’s.

“This is truly an extraordinary honor for me, and I’m profoundly grateful for the recognition,” Lee said. “Among countless deserving researchers, I feel deeply humbled to have been selected. Receiving an award of this magnitude entails not just privilege but also a significant responsibility. One of the most fulfilling aspects of my role as a faculty member and scientist is being able to serve as an inspiration for young individuals. As AI continues to revolutionize both science and society, my hope is that this achievement will inspire others to tackle crucial challenges aimed at enhancing science and health for all.”

Prescience: Helping doctors predict the future

Sarah McQuate — Wed, 10 Oct 2018 17:20:28 +0000

During surgery, anesthesiologists monitor and manage patients to make sure they are safe and breathing well. But these doctors can’t always predict when complications will arise.

This research is featured on the cover of the October 2018 issue of Nature Biomedical Engineering, shown above.

Now researchers at the 91̽�� have developed a new machine-learning system, called Prescience, which uses input from patient charts and standard operating room sensors to predict the likelihood that a patient will develop hypoxemia — a condition when blood oxygen levels dip slightly below normal. Hypoxemia can lead to serious consequences, such as infections and abnormal heart behavior.

Prescience also provides real-world explanations behind its predictions. With this information, anesthesiologists can better understand why a patient is at risk for hypoxemia and prevent it before it happens. The team, which Oct. 10 in Nature Biomedical Engineering, estimates that Prescience could improve the ability of anesthesiologists to anticipate and prevent 2.4 million more hypoxemia cases in the United States every year.

“Modern machine-learning methods often just spit out a prediction result. They don’t explain to you what patient features contributed to that prediction,” said , an associate professor in the UW’s Paul G. Allen School of Computer Science & Engineering and senior author of the paper. “Our new method opens this black box and actually enables us to understand why two different patients might develop hypoxemia. That’s the power.”

Su-In Lee (left) and Scott Lundberg set out to create a machine-learning system that predicts low blood oxygen during surgery. It also provides real-world explanations behind its predictions. Photo: Mark Stone/91̽��

Lee and , a doctoral student in the Allen School, started the project by meeting with collaborators from 91̽��Medicine to find out what they needed in the operating room.

“One of the things the anesthesiologists said was: ‘We are not really satisfied with just a prediction. We want to know why,'” Lee said. “So that got us thinking.”

Lee and Lundberg set out to create a machine-learning system that could both make predictions and explain them. First, they acquired a dataset of 50,000 real surgeries from 91̽�� Medical Center and Harborview Medical Center in Seattle. These data include patient intake information like age and weight as well as real-time, minute-by-minute information — heart rate, blood oxygen levels and more — throughout the surgeries. The scientists used all of these data to teach Prescience to make predictions.

See a related story in .
Read an editorial that highlights .

The team wanted Prescience to solve two different kinds of problems. Prescience needed to look at pre-surgery information and predict whether any given patient would have hypoxemia while under anesthesia. Prescience also had to predict hypoxemia at any point throughout surgery by looking at real-time information. Finally, Lee and Lundberg to train Prescience to generate understandable explanations behind its predictions.

For the pre-surgery data, Prescience found that body mass index was one important feature that contributed to a prediction that a patient would experience hypoxemia during surgery. But during surgery, the blood oxygen levels themselves contributed the most to a prediction.

With this information in mind, it was time to put Prescience to the test.

Lee and Lundberg created a web interface that ran anesthesiologists through pre-surgery and real-time cases from surgeries in the dataset that were not used to train Prescience. For the real-time test, the researchers specifically chose cases that would be hard to predict, such as when a patient’s blood oxygen level is stable for 10 minutes and then drops.

This web interface ran anesthesiologists through pre-surgery and real-time cases. For some cases, the doctors got an additional bar of information from Prescience. Photo: Mark Stone/91̽��

“We wanted to know if this was going to be informative to anesthesiologists,” said Lundberg, who is the first author on the paper. “So for some of their cases, they got a bar of additional information from Prescience.”

Prescience improved the doctors’ ability to correctly predict a patient’s hypoxemia risk by 16 percent before a surgery and by 12 percent in real time during a surgery. Overall, with the help of Prescience, the anesthesiologists were able to correctly distinguish between the two scenarios nearly 80 percent of the time both before and during surgery.

“This research will allow us to better anticipate complications and target our treatment to each patient,” said co-author Dr. Monica Vavilala, professor of anesthesiology and pain medicine at the 91̽��School of Medicine and director of the Harborview Injury Prevention & Research Center. “If we know there’s one aspect that’s causing the problem, then we can approach that first and more quickly. This could really change the way we practice, so this is a really big deal.”

Four members of the team behind Prescience. Left to right: Bala Nair, Su-In Lee, Monica Vavilala and Scott Lundberg. Photo: Mark Stone/91̽��

Prescience isn’t quite ready to be in operating rooms yet. Lee and Lundberg plan to continue working with anesthesiologists to improve Prescience and give it an interface that’s both intuitive and useful. In addition, the team hopes that later versions of Prescience will be able to predict other harmful conditions, such as low blood pressure, and recommend treatment plans.

Regardless of Prescience’s future, one point is clear: This technology is meant , Lundberg said.

“Prescience doesn’t treat anyone,” he said. “Instead it tells you why it’s concerned, which then enables the doctor to make better treatment decisions.”

Co-author Dr. Bala Nair, research associate professor of anesthesiology and pain medicine at the 91̽��School of Medicine, helped Lee and Lundberg acquire the dataset. Another co-author, Dr. Jerry Kim, assistant professor of anesthesiology and pain medicine at Seattle Children’s Hospital, proposed the initial project with Lee. Other co-authors include Shu-Fang Newman, 91̽��Medicine Anesthesiology & Pain Medicine; Dr. Mayumi Horibe, Veterans Affairs Puget Sound Health Care System; and Dr. Michael Eisses, Dr. Trevor Adams, Dr. David Liston and Dr. Daniel Low, Seattle Children’s Hospital. This research was funded by the National Science Foundation (DBI-1355899), NSF Graduate Research Fellowship (DGE-1256082), and a 91̽��eScience/ITHS seed grant, “Machine Learning in Operating Rooms.”

###

For more information, contact Lee at suinlee@cs.washington.edu.