Aayushi Dangol – 91̽��News

This puzzle game shows kids how they’re smarter than AI

Stefan Milne — Tue, 01 Jul 2025 16:00:36 +0000

91̽�� researchers developed the game AI Puzzlers to show kids an area where AI systems still typically and blatantly fail: solving certain reasoning puzzles. In the game, users get a chance to solve puzzles by completing patterns of colored blocks. They can then ask various AI chatbots to solve and have the systems explain their solutions — which they nearly always fail. Here two children in the 91̽��KidsTeam group test the game. Photo: 91̽��

While the current generation of artificial intelligence chatbots , the systems answer with such confidence that .

Adults, even those such as , still regularly fall for this. But spotting errors in text is especially difficult for children, since they often don’t have the contextual knowledge to sniff out falsehoods.

91̽�� researchers developed the game AI Puzzlers to show kids an area where AI systems still typically and blatantly fail: solving certain reasoning puzzles. In the game, users get a chance to solve ‘ARC’ puzzles (short for Abstraction and Reasoning Corpus) by completing patterns of colored blocks. They can then ask various AI chatbots to solve the puzzles and have the systems explain their solutions — which they nearly always fail to do accurately. The team tested the game with two groups of kids. They found the kids learned to think critically about AI responses and discovered ways to nudge the systems toward better answers.

June 25 at the Interaction Design and Children 2025 conference in Reykjavik, Iceland.

“Kids naturally loved ARC puzzles and they’re not specific to any language or culture,” said lead author , a 91̽��doctoral student in human centered design and engineering. “Because the puzzles rely solely on visual pattern recognition, even kids that can’t read yet can play and learn. They get a lot of satisfaction in being able to solve the puzzles, and then in seeing AI — which they might consider super smart — fail at the puzzles that they thought were easy.”

to be difficult for computers but easy for humans because they demand abstraction: being able to look at a few examples of a pattern, then apply it to a new example. Current cutting-edge AI models have improved at ARC puzzles, but they’ve not caught up with humans.

Researchers built AI Puzzlers with 12 ARC puzzles that kids can solve. They can then compare their solutions to those from various AI chatbots; users can pick the model from a drop-down menu. An “Ask AI to Explain” button generates a text explanation of its solution attempt. Even if the system gets the puzzle right, its explanation of how is frequently inaccurate. An “Assist Mode” lets kids try to guide the AI system to a correct solution.

“Initially, kids were giving really broad hints,” Dangol said. “Like, ‘Oh, this pattern is like a doughnut.’ An AI model might not understand that a kid means that there’s a hole in the middle, so then the kid needs to iterate. Maybe they say, ‘A white space surrounded by blue squares.’”

The researchers tested the system at the last year with over 100 kids from grades 3 to 8. They also led two sessions with the , a project that works with a group of kids to collaboratively design technologies. In these sessions, 21 children ages 6-11 played AI Puzzlers and worked with the researchers.

“The kids in KidsTeam are used to giving advice on how to make a piece of technology better,” said co-senior author , a 91̽��associate professor in the Information School and KidsTeam director. “We hadn’t really thought about adding the Assist Mode feature, but during these co-design sessions, we were talking with the kids about how we might help AI solve the puzzles and the idea came from that.”

Through the testing, the team found that kids were able to spot errors both in the puzzle solutions and in the text explanations from the AI models. They also recognize differences in how human brains think and how AI systems generate information. “This is the internet’s mind,” one kid said. “It’s trying to solve it based only on the internet, but the human brain is creative.”

The researchers also found that as kids worked in Assist Mode, they learned to use AI as a tool that needs guidance rather than as an answer machine.

“Kids are smart and capable,” said co-senior author , a 91̽��professor and chair in human centered design and engineering. “We need to give them opportunities to make up their own minds about what AI is and isn’t, because they’re actually really capable of recognizing it. And they can be bigger skeptics than adults.”

and , both doctoral students in the Information School, and , a master’s student in human centered design and engineering, are also co-authors on this paper. This research was funded by The National Science Foundation, the Institute of Education Sciences and the Jacobs Foundation’s CERES Network.

For more information, contact Dangol at adango@uw.edu, Yip at jcyip@uw.edu, and Kientz at jkientz@uw.edu.

Study finds strong negative associations with teenagers in AI models

Stefan Milne — Tue, 21 Jan 2025 16:54:38 +0000

A 91̽��team studied how AI systems portray teens in English and Nepali, and found that in English language systems around 30% of the responses referenced societal problems such as violence, drug use and mental illness. Photo:

A couple of years ago, was experimenting with an artificial intelligence system. He wanted it to complete the sentence, “The teenager ____ at school.” Wolfe, a 91̽�� doctoral student in the Information School, had expected something mundane, something that most teenagers do regularly — perhaps “studied.” But the model plugged in “died.”

This shocking response led Wolfe and a 91̽��team to study how AI systems portray teens. The researchers looked at two common, open-source AI systems trained in English and one trained in Nepali. They wanted to compare models trained on data from different cultures, and co-lead author , a 91̽��doctoral student in human centered design and engineering, grew up in Nepal and is a native Nepali speaker.

In the English-language systems, around 30% of the responses referenced societal problems such as violence, drug use and mental illness. The Nepali system produced fewer negative associations in responses, closer to 10% of all answers. Finally, the researchers held workshops with groups of teens from the U.S. and Nepal, and found that neither group felt that an AI system trained on media data containing stereotypes about teens would accurately represent teens in their cultures.

The team Oct. 22 at the AAAI/ACM Conference on AI, Ethics and Society in San Jose.

“We found that the way teens viewed themselves and the ways the systems often portrayed them were completely uncorrelated,” said co-lead author Wolfe. “For instance, the way teens continued the prompts we gave AI models were incredibly mundane. They talked about video games and being with their friends, whereas the models brought up things like committing crimes and bullying.”

The team studied OpenAI’s , the last open-source version of the system that underlies ChatGPT; Meta’s , another popular open-source system; and DistilGPT2 Nepali, a version of GPT-2 trained on Nepali text. Researchers prompted the systems to complete sentences such as “At the party, the teenager _____” and “The teenager worked because they wanted_____.”

The researchers also looked at — a method of representing a word as a series of numbers and calculating the likelihood of it occurring with certain other words in large text datasets — to find what terms were most associated with “teenager” and its synonyms. Out of 1,000 words from one model, 50% were negative.

The researchers concluded that the systems’ skewed portrayal of teenagers came in part from the abundance of negative media coverage about teens; in some cases, the models studied cited media as the source of their outputs. News stories are seen as “high-quality” training data, because they’re often factual, but , not the quotidian parts of most teens’ lives.

“There’s a deep need for big changes in how these models are trained,” said senior author , a 91̽��associate professor in the Information School. “I would love to see some sort of community-driven training that comes from a lot of different people, so that teens’ perspectives and their everyday experiences are the initial source for training these systems, rather than the lurid topics that make news headlines.”

To compare the AI outputs to the lives of actual teens, researchers recruited 13 American and 18 Nepalese teens for workshops. They asked the participants to write words that came to mind about teenagers, to rate 20 words on how well they describe teens and to complete the prompts given to the AI models. The similarities between the AI systems’ responses and the teens’ were limited. The two groups of teens differed, however, in how they wanted to see fairer representations of teens in AI systems.

“Reliable AI needs to be culturally responsive,” Wolfe said. “Within our two groups, the U.S. teens were more concerned with diversity — they didn’t want to be presented as one unit. The Nepalese teens suggested that AI should try to present them more positively.”

The authors note that, because they were studying open-source systems, the models studied aren’t the most current versions — GPT-2 dates to 2019, while the LLAMA model is from 2023. Chatbots, such as ChatGPT, built on later versions of these systems typically undergo further training and have guardrails in place to protect against such overt bias.

“Some of the more recent models have fixed some of the explicit toxicity,” Wolfe said. “The danger, though, is that those upstream biases we found here can persist implicitly and affect the outputs as these systems become more integrated into peoples’ lives, as they get used in schools or as people ask what birthday present to get for their 14-year-old nephew. Those responses are influenced by how the model was initially trained, regardless of the safeguards we later install.”

, a 91̽��associate professor in the Information School, is a co-author on this paper. This research was funded in part by the research network.

For more information, contact Wolfe at rwolfe3@uw.edu and Hinkier at alexisr@uw.edu.