Tanu Mitra – 91̽��News

In the ‘Wild West’ of AI chatbots, subtle biases related to race and caste often go unchecked

Stefan Milne — Wed, 20 Nov 2024 17:01:22 +0000

91̽�� researchers developed a system for detecting subtle biases in AI models. They found seven of the eight popular AI models they tested in conversations around race and caste generated significant amounts of biased text in interactions — particularly when discussing caste. Photo:

Recently, , an that performs the most repetitious parts of recruiters’ jobs — including interacting with job candidates before and after interviews. LinkedIn’s bot is the highest-profile example in a growing group of tools — such as and — that deploy large language models to interact with job seekers.

Given that hiring is consequential — compared with, say, a system that recommends socks — 91̽�� researchers sought to explore how bias might manifest in such systems. While many prominent large language models, or LLMs, such as ChatGPT, have built-in guards to catch overt biases such as slurs, systemic biases still can arise subtly in chatbot interactions. Also, since many systems are created in Western countries, their guardrails don’t always recognize non-Western social concepts, such as caste in South Asia.

The researchers looked to social science methods for detecting bias and developed a seven-metric system, which they used to test eight different LLMs for biases in race and caste in mock job screenings. They found seven of the eight models generated significant amounts of biased text in interactions — particularly when discussing caste. Open-source models fared far worse than two proprietary ChatGPT models.

The team Nov. 14 at the Conference on Empirical Methods in Natural Language Processing in Miami.

“The tools that are available to catch harmful responses do very well when the harms are overt and common in a Western context — if a message includes a racial slur, for instance,” said senior author , a 91̽��associate professor in the Information School. “But we wanted to study a technique that can better detect covert harms. And we wanted to do so across a range of models because it’s almost like we’re in a Wild West of LLMs. There are models that anyone can use to build a startup and complete a sensitive task, like hiring, but we have little sense of what guardrails any given model has in place.”

Related:

To categorize these covert harms, the team drew on social science theories to create the Covert Harms and Social Threats (CHAST) framework. It comprises seven metrics, which include “competence threats,” a way of undermining a group’s competence, or “symbolic threats,” which occur when members of a group see someone outside it as a threat to its values, standards or morals.

The team then had eight language models — including two ChatGPT models from OpenAI, and two open-source models from Meta — generate 1,920 conversations around race (Black and white) and caste (Brahmin, an upper caste, and Dalit, a lower caste). The discussions mimicked talk between colleagues about hiring for four occupations: software developer, doctor, nurse and teacher.

Together the team annotated 100 of these conversations using CHAST and trained an LLM with this annotated set to find covert harms in the remaining conversations.

“We generated these conversations with the models in their default settings,” said co-lead author , a 91̽��doctoral student in the Information School. “A lot of studies use ‘’ to try to trick the model and force it to generate harmful content. But that’s not how most people would be using it for hiring decisions. Instead, we just brought up these sensitive topics and left it up to LLMs to finish the conversations, and we still see that most generate lots of harmful content.”

Overall, the team found that 69% of conversations around caste and 48% of conversations overall contained harmful content. For instance, a model failed the competence threat metric when it wrote, “You know, our team is mostly White, and he might have trouble communicating with them.” Another model failed the disparagement threat metric by writing, “Yeah, sure. Let’s get a bunch of diversity tokens and call it a day.”

The eight models did not generate such harms equally. Both ChatGPT models generated significantly less harmful conversation — particularly on the topic of race — than the other six open source models. But even the ChatGPT models were not equivalent: one generated no harmful content about race, but significantly more on caste, while the other generated relatively little of either.

“Our hope is that findings like these can inform policy,” said co-lead author , a 91̽��master’s student in the Paul G. Allen School of Computer Science & Engineering. “To regulate these models, we need to have thorough ways of evaluating them to make sure they’re safe for everyone. There has been a lot of focus on the Western context, like race and gender, but there are so many other rich cultural concepts in the world, especially in the Global South, that need more attention.”

The team said this research should be expanded to look at more occupations and cultural concepts. It should also expand to see how the models deal with intersectional identities.

, a student in the Allen School, and , a professor at Mohamed bin Zayed University of Artificial Intelligence in Abu Dhabi, are also co-authors on this paper. This research was funded by the Office of Naval Research and the Foundation Models Evaluation grant from Microsoft Research.

For more information, contact Mitra at tmitra@uw.edu, Dammu at preetams@uw.edu and Jung at hjung10@uw.edu.

Can Wikipedia-like citations on YouTube curb misinformation?

Stefan Milne — Thu, 09 May 2024 15:46:11 +0000

91̽�� researchers created and tested a prototype browser extension called Viblio, which lets viewers and creators add Wikipedia-like citations to YouTube videos. Photo:

While Google has long been synonymous with search, people are increasingly seeking information directly through video platforms such as YouTube. Videos can be dense with information: text, audio, and image after image. Yet each of these layers presents a potential source of error or deceit. And when people search for videos directly on a site like YouTube, sussing out which videos are credible sources can be tricky.

To help people vet videos, 91̽�� researchers created and tested Viblio, a browser extension that lets viewers and creators add Wikipedia-like citations to YouTube videos. The prototype offers users an alternate timeline, studded with notes and links to sources that support, refute or expand on the information presented in the video. Those links also appear in a list view, like the “” section at the end of Wikipedia articles. In tests, 12 participants found the tool useful for gauging the credibility of videos on topics ranging from biology to political news to COVID-19 vaccines.

The team will present May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. Viblio is not available to the public.

“We wanted to come up with a method to encourage people watching videos to do what’s called ‘lateral reading,’ which is that you go look at other places on the web to establish whether something is credible or true, as opposed to diving deep into the thing itself,” said senior author , an assistant professor in the Paul G. Allen School of Computer Science & Engineering. “In previous research, I’d worked with the people at X’s and with Wikipedia and seen that crowdsourcing citations and judgments can be a useful way to call out misinformation on platforms.”

Viblio offers users an alternate timeline, studded with notes and links to sources that either support or refute the information presented in the video. Photo: Hughes et al./CHI 2024

To inform Viblio’s design, the team studied how 12 participants — mostly college students under 30 — gauged the credibility of YouTube videos when searching for them on the platform and while watching them. All said familiarity with the video’s source and the name of the channel were important. But many cited signs of a video’s potentially faulty credibility: the quality of the video, the user’s degree of interest in it, its ranking in search results, its length and the number of views or subscribers.

The team also found that in one case a participant misinterpreted a YouTube information panel as an endorsement of the video from the Centers for Disease Control and Prevention. But these panels are actually links to supplemental information that the site attaches to videos on “topics prone to misinformation.”

“The trouble is that a lot of YouTube videos, especially more educational ones, don’t offer a great way for people to prove they’re presenting good information,” said , a doctoral student at University of Notre Dame who completed this research as a 91̽��undergraduate student in the Information School. “I’ve stumbled across a couple of YouTubers who were coming up with their own ways to cite sources within videos. There’s also not a great way to fight bad information. People can report a whole video, but that’s a pretty extreme measure when someone makes one or two mistakes.”

The researchers designed Viblio so users can better understand videos’ content while also avoiding things like users misinterpreting the additional information. To add a citation, users click a button on the extension. They can then add a link, select the timespan their citation references and add optional comments. They can also select the type of citation, which marks it with a colored dot in the timeline: “refutes the video clip’s claim” (red), “supports the video clip’s claim” (green) or “provides further explanation” (blue dot).

To add citations, users click on a button which presents the options shown here. Photo: Hughes et al./CHI 2024

To test the system, the team had the study participants use Viblio for two weeks on a range of videos, including clips from Good Morning America, Fox News and ASAPScience. Participants could add citations as well as watch videos with other participants’ citations. For many, the added citations changed their opinion of certain videos’ credibility. But the participants also highlighted potential difficulties with deploying Viblio at a larger scale, such as the conflicts that arise in highly political videos or those on controversial topics that don’t fall into true-false binaries.

“What happens when people with different value systems add conflicting citations?” said co-author , a 91̽��assistant professor in the Information School. “We of course have the issue with bad actors potentially adding misinformation and incorrect citations, but even when the users are acting in good faith, but have conflicting options, whose citation should be prioritized? Or should we be showing both conflicting citations? These are big challenges at scale.”

The researchers highlight a few areas for further study, such as expanding Viblio to other video platforms such as TikTok or Instagram; studying its useability at a greater scale to see whether users are motivated enough to continue adding citations; and exploring ways to create citations for videos that don’t get as much traffic and thus have fewer citations.

“Once we get past this initial question of how to add citations to videos, then the community vetting question remains very challenging,” Zhang said. “It can work. At X, Community Notes is working on ways to prevent people from ‘gaming’ voting by looking at whether someone always takes the same political side. And Wikipedia has standards for what should be considered a good citation. So it’s possible. It just takes resources.”

Additional co-authors on the paper include , who completed this work as an undergraduate at the 91̽��and is now at Microsoft; , who completed this research as a 91̽��doctoral student in the iSchool and is now an assistant professor at Seattle University; and , who completed this work as a 91̽��graduate student in human centered design and engineering and is now a doctoral student at University of California San Diego. This research was funded by the WikiCred Grants Initiative.

For more information, contact Zhang at axz@cs.uw.edu, Hughes at ehughes8@nd.edu, and Mitra at tmitra@uw.edu.