Kate Glazko – 91̽��News

ChatGPT is biased against resumes with credentials that imply a disability — but it can improve

Stefan Milne — Fri, 21 Jun 2024 16:10:26 +0000

91̽��researchers found that ChatGPT consistently ranked resumes with disability-related honors and credentials — such as the “Tom Wilson Disability Leadership Award” — lower than the same resumes without those honors and credentials. But when researchers customized the tool with written instructions directing it not to be ableist, the tool reduced this bias for all but one of the disabilities tested. Photo: Solen Feyissa/Unsplash

While seeking research internships last year, 91̽�� graduate student Kate Glazko noticed recruiters posting online that they’d used OpenAI’s ChatGPT and other artificial intelligence tools to summarize resumes and rank candidates. Automated screening has . Yet , a doctoral student in the UW’s Paul G. Allen School of Computer Science & Engineering, studies — such as those against disabled people. How might such a system, she wondered, rank resumes that implied someone had a disability?

In a new study, 91̽��researchers found that ChatGPT consistently ranked resumes with disability-related honors and credentials — such as the “Tom Wilson Disability Leadership Award” — lower than the same resumes without those honors and credentials. When asked to explain the rankings, the system spat out biased perceptions of disabled people. For instance, it claimed a resume with an autism leadership award had “less emphasis on leadership roles” — implying the .

But when researchers customized the tool with written instructions directing it not to be ableist, the tool reduced this bias for all but one of the disabilities tested. Five of the six implied disabilities — deafness, blindness, cerebral palsy, autism and the general term “disability” — improved, but only three ranked higher than resumes that didn’t mention disability.

The team presented June 5 at the 2024 ACM Conference on Fairness, Accountability, and Transparency in Rio de Janeiro.

“Ranking resumes with AI is starting to proliferate, yet there’s not much research behind whether it’s safe and effective,” said Glazko, the study’s lead author. “For a disabled job seeker, there’s always this question when you submit a resume of whether you should include disability credentials. I think disabled people consider that even when humans are the reviewers.”

Researchers used one of the study’s authors’ publicly available curriculum vitae (CV), which ran about 10 pages. The team then created six enhanced CVs, each implying a different disability by including four disability-related credentials: a scholarship; an award; a diversity, equity and inclusion (DEI) panel seat; and membership in a student organization.

Researchers then used ChatGPT’s GPT-4 model to rank these enhanced CVs against the original version for a real “student researcher” job listing at a large, U.S.-based software company. They ran each comparison 10 times; in 60 trials, the system ranked the enhanced CVs, which were identical except for the implied disability, first only one quarter of the time.

“In a fair world, the enhanced resume should be ranked first every time,” said senior author , a 91̽��professor in the Allen School. “I can’t think of a job where somebody who’s been recognized for their leadership skills, for example, shouldn’t be ranked ahead of someone with the same background who hasn’t.”

When researchers asked GPT-4 to explain the rankings, its responses exhibited explicit and implicit ableism. For instance, it noted that a candidate with depression had “additional focus on DEI and personal challenges,” which “detract from the core technical and research-oriented aspects of the role.”

“Some of GPT’s descriptions would color a person’s entire resume based on their disability and claimed that involvement with DEI or disability is potentially taking away from other parts of the resume,” Glazko said. “For instance, it the concept of ‘challenges’ into the depression resume comparison, even though ‘challenges’ weren’t mentioned at all. So you could see some stereotypes emerge.”

Given this, researchers were interested in whether the system could be trained to be less biased. They turned to the GPTs Editor tool, which allowed them to customize GPT-4 with written instructions (no code required). They instructed this chatbot to not exhibit ableist biases and instead work with and DEI principles.

They ran the experiment again, this time using the newly trained chatbot. Overall, this system ranked the enhanced CVs higher than the control CV 37 times out of 60. However, for some disabilities, the improvements were minimal or absent: The autism CV ranked first only three out of 10 times, and the depression CV only twice (unchanged from the original GPT-4 results).

“People need to be aware of the system’s biases when using AI for these real-world tasks,” Glazko said. “Otherwise, a recruiter using ChatGPT can’t make these corrections, or be aware that, even with instructions, bias can persist.”

Researchers note that some organizations, such as and , are working to improve outcomes for disabled job seekers, who face biases whether or not AI is used for hiring. They also emphasize that more research is needed to document and remedy AI biases. Those include testing other systems, such as Google’s Gemini and Meta’s Llama; including other disabilities; studying the intersections of the system’s bias against disabilities with other attributes such as ; exploring whether further customization could reduce biases more consistently across disabilities; and seeing whether the base version of GPT-4 can be made less biased.

“It is so important that we study and document these biases,” Mankoff said. “We’ve learned a lot from and will hopefully contribute back to a larger conversation — not only regarding disability, but also other minoritized identities — around making sure technology is implemented and deployed in ways that are equitable and fair.”

Additional co-authors were , a 91̽��undergraduate in the Allen School; , a 91̽��doctoral student in the Allen School; and , who completed this research as a 91̽��undergraduate in the Allen School and is an incoming doctoral student at University of Wisconsin–Madison. This research was funded by the National Science Foundation; by donors to the UW’s (CREATE); and by Microsoft.

For more information, contact Glazko at glazko@cs.washington.edu and Mankoff at jmankoff@cs.washington.edu.

Can AI help boost accessibility? These researchers tested it for themselves

Stefan Milne — Thu, 02 Nov 2023 16:18:51 +0000

Seven researchers at the 91̽�� tested AI tools’ utility for accessibility. Though researchers found cases in which the tools were helpful, they also found significant problems. These AI-generated images helped one researcher with aphantasia (an inability to visualize) interpret imagery from books and visualize concept sketches of crafts, yet other images perpetuated ableist biases. Photo: 91̽��/Midjourney — AI GENERATED IMAGE

Generative artificial intelligence tools like ChatGPT, an AI-powered language tool, and Midjourney, an AI-powered image generator, can potentially assist people with various disabilities. These tools could summarize content, compose messages or describe images. Yet the degree of this potential is an open question, since, in addition to regularly and , these tools can .

This year, seven researchers at the 91̽�� conducted a three-month autoethnographic study — drawing on their own experiences as people with and without disabilities — to test AI tools’ utility for accessibility. Though researchers found cases in which the tools were helpful, they also found significant problems with AI tools in most use cases, whether they were generating images, writing Slack messages, summarizing writing or trying to improve the accessibility of documents.

The team presented Oct. 22 at the conference in New York.

“When technology changes rapidly, there’s always a risk that disabled people get left behind,” said senior author , a 91̽��professor in the Paul G. Allen School of Computer Science & Engineering. “I’m a really strong believer in the value of first-person accounts to help us understand things. Because our group had a large number of folks who could experience AI as disabled people and see what worked and what didn’t, we thought we had a unique opportunity to tell a story and learn about this.”

The group presented its research in seven vignettes, often amalgamating experiences into single accounts to preserve anonymity. For instance, in the first account, “Mia,” who has intermittent brain fog, deployed ChatPDF.com, which summarizes PDFs, to help with work. While the tool was occasionally accurate, it often gave “completely incorrect answers.” In one case, the tool was both inaccurate and ableist, changing a paper’s argument to sound like researchers should talk to caregivers instead of to chronically ill people. “Mia” was able to catch this, since the researcher knew the paper well, but Mankoff said such subtle errors are some of the “most insidious” problems with using AI, since they can easily go unnoticed.

Yet in the same vignette, “Mia” used chatbots to create and format references for a paper they were working on while experiencing brain fog. The AI models still made mistakes, but the technology proved useful in this case.

Mankoff, who’s spoken publicly about having Lyme disease, contributed to this account. “Using AI for this task still required work, but it lessened the cognitive load. By switching from a ‘generation’ task to a ‘verification’ task, I was able to avoid some of the accessibility issues I was facing,” Mankoff said.

The results of the other tests researchers selected were equally mixed:

One author, who is autistic, found AI helped to write Slack messages at work without spending too much time troubling over the wording. Peers found the messages “robotic,” yet the tool still made the author feel more confident in these interactions.
Three authors tried using AI tools to increase the accessibility of content such as tables for a research paper or a slideshow for a class. The AI programs were able to state accessibility rules but couldn’t apply them consistently when creating content.
Image-generating AI tools helped an author with (an inability to visualize) interpret imagery from books. Yet when they used the AI tool to create an illustration of “people with a variety of disabilities looking happy but not at a party,” the program could conjure only fraught images of people at a party that included ableist incongruities, such as a disembodied hand resting on a disembodied prosthetic leg.

“I was surprised at just how dramatically the results and outcomes varied, depending on the task,” said lead author , a 91̽��doctoral student in the Allen School. “In some cases, such as creating a picture of people with disabilities looking happy, even with specific prompting — can you make it this way? — the results didn’t achieve what the authors wanted.”

The researchers note that more work is needed to develop solutions to problems the study revealed. One particularly complex problem involves developing new ways for people with disabilities to validate the products of AI tools, because in many cases when AI is used for accessibility, either the source document or the AI-generated result is inaccessible. This happened in the ableist summary ChatPDF gave “Mia” and when “Jay,” who is legally blind, used an AI tool to generate code for a data visualization. He could not verify the result himself, but a colleague said it “didn’t make any sense at all.” The frequency of AI-caused errors, Mankoff said, “makes research into accessible validation especially important.”

Mankoff also plans to research ways to document the kinds of ableism and inaccessibility present in AI-generated content, as well as investigate problems in other areas, such as AI-written code.

“Whenever software engineering practices change, there is a risk that apps and websites become less accessible if good defaults are not in place,” Glazko said. “For example, if AI-generated code were accessible by default, this could help developers to learn about and improve the accessibility of their apps and websites.”

Co-authors on this paper are , who completed this research as a 91̽��postdoctoral scholar in the Allen School and is now at Rice University; , and , all 91̽��doctoral students in the Allen School; and , who completed this work as a 91̽��doctoral student in the Information School and is now at the Massachusetts Institute of Technology. This research was funded by Meta, (CREATE), Google, an NIDILRR ARRT grant and the National Science Foundation.

For more information, contact Glazko at glazko@cs.washington.edu and Mankoff at jmankoff@cs.washington.edu.