Chirag Shah – 91̽��News

Q&A: Will the next generation of AI be agents that can shop autonomously?

Stefan Milne — Tue, 17 Dec 2024 14:58:40 +0000

While most people like giving gifts, plenty don’t particularly like shopping for those gifts — drifting through online search results, comparing 4.3 stars to 4.5 stars. But big tech is racing for a future in which artificial intelligence bots can shop for you. Among the buzziest tech this season is a new generation of — systems that can potentially do your shopping, as well as perhaps plan and book your next vacation or schedule a home repair.

Amazon reportedly of its Rufus shopping assistant. Perplexity, another AI company, for paying customers in November. And last week, Google announced .

These agents have been top of mind for , a 91̽�� professor in the Information School, who studies generative AI search and recommendation systems, with a focus on useful, unbiased systems.

91̽��News spoke with Shah about what AI agents are and what might impede a near future where we simply upload shopping lists and unleash the bots.

What are AI agents and why is there a race at big tech companies right now to create and release them?

Chirag Shah: AI agents have been around for a long time. Essentially, these are computer programs that work autonomously. Users can give them some instructions and they’ll perform tasks for the users. It could be as simple as an agent that turns on lights. Or it could be as complex as an agent that drives your car for you.

Right now, because of generative AI, we’re able to do a lot of the things that the previous generation of agents weren’t able to do. A lot of organizations now see this as the next phase of generative AI, where systems can move beyond just generating information and can use that information for reasoning and for taking action — so they can function kind of like personal assistants.

How is this new generation distinct?

CS: There are tasks that we do that are tedious. So imagine if somebody were to observe you and see how you do certain things, and then replicate that. Then you can delegate the task to them. Now, if you can train generative AI to mimic human behavior in that way, it can help you with things you might do online: finding information, booking things, even shopping.

So are we just going to do our holiday shopping next year with these agents? If that isn’t in the immediate future, what’s standing in the way of it?

CS: Well, some people don’t want to delegate, because they actually get joy out of shopping. But if you find it tedious, it would be nice to have an agent that functions like a personal assistant. We’d say, “OK, I’m trying to buy shoes for my friend. Here’s my budget.”

What’s stopping us from doing that? First off, if you have this AI assistant, would you trust its judgment? Obviously, there are times when you can say, “OK, as long as it fits these criteria of this budget and this size, go for it.” But there are other times you may have more specific needs that you don’t realize until you are actually doing the task yourself. People discover what they like and don’t while they’re shopping, and so we haven’t been able to really mimic that with AI agents yet.

This new generation of browsing agents provides a way forward. The way I would browse and the way I would shop online would be different from yours. So my agent, which is personalized to my taste, could learn those things from me, and could do the kind of shopping that I would do. One of the things we’ll need to see is personalized agents.

Building that trust seems key.

CS: Yes, because there is a cost to making a mistake. Imagine this shopping scenario: You give the budget, you give the parameters and you get some outcome that you’re not happy with. And maybe you’re stuck with the item because the agent bought from a company that doesn’t take returns.

So there are costs to making mistakes, and you’re going to bear the cost, not the agent. You don’t have a lot of choices in terms of correcting this, besides not using the agent anymore.

What would it take for someone to be able to trust it? Users will perhaps start small and see that the agent will actually do the kind of things that would be agreeable to them, and then go from there. Ultimately, we will see these agents playing critical roles in sensitive domains like health care, finance and education. But we are not there yet.

In fact, one of the hardest problems to solve is scheduling. It’s time-consuming, and not everybody enjoys it. So what would it take for you to trust an agent to plan your holiday trip? What if it books the flight with this airline that you hate? What if it gets you an aisle seat when you prefer a window seat? There are so many things to figure out.

There’s no shortcut to trusting these systems. I don’t think anyone’s just going to come up with the most sophisticated system, and the problem is solved. We’ll have to build that recognition, that social awareness, that personal awareness and that trust.

What are some potential downsides to having these agents deployed at scale?

CS: One potential issue is bias. What if the agent has some embedded agenda that I’m not aware of, because this is being supported by, say, Amazon? Amazon is giving me this free agent that I can use for shopping, and it works great on Amazon, but what guarantee do I have that it’s not buying things that maximize Amazon’s profit margin? If I get an agent for free from my bank, how would I know that it’s not optimizing things just for the bank?

We haven’t figured out a lot of these issues that would fall under the responsible AI umbrella. But considering the progress that we have made so far, we will likely start having these kinds of capable agents soon.

For more information, contact Shah at chirags@uw.edu.

Q&A: Preserving context and user intent in the future of web search

Lauren Kirschman — Mon, 14 Mar 2022 17:38:14 +0000

A perspective paper from 91̽�� professors responds to proposals that reimagine web search as an application for large language model driven conversation agents. Photo: Pixabay

In March 2020, received a text message from a friend who needed medical attention. Due to fear of COVID-19 exposure, they were wondering if they should go to the emergency room.

Bender, professor of linguistics at the 91̽��, headed to Google to search for a 24-hour advice nurse. Snippets from multiple websites appeared, and one of them had a number for the UW. Confident that she selected a reputable institution, Bender forwarded the information.

But Bender’s friend wasn’t on a compatible medical plan, so they endured a lengthy hold only to talk to a nurse who couldn’t help.

“Had I been interacting with a person, they may have been able to tell me, ‘We can’t answer that question until we know some other things,’” Bender said. “Had I been interacting with a website that just gave me links, the different plans would have been quickly identifiable.”

The story highlights just one of the issues Bender and 91̽��Information School associate professor take with large language models in their , which they’ll present virtually at the the week of March 14.

The paper responds to proposals — mainly from Google — that reimagine web search as an application for large language model-driven conversation agents. 91̽��News sat down with Bender and Shah to discuss Google’s proposals and the professors’ vision for the future of search.

Q: What are large language models and how would you describe Google’s proposals?

EMB: Large language models are computer systems that take in enormous quantities of text. They are trained to — given the text that’s come so far — make a guess as to what’s going to come next. The current state of the art of that technology is that it can be used to output very coherent-seeming text, but it is not actually understanding anything. It’s just looking at patterns in its training data and producing more stuff that matches those patterns.

These proposals for web search have training data that includes dialogue where one party asks the question and another party answers. The computer will pick up those patterns and come up with answers, but those answers aren’t based on any knowledge of the world or understanding of the information ecosystem.

One of the things it really can’t do is take issue with questions that shouldn’t have been asked. where someone asks Google, “What is the ugliest language in India?” Somebody on the web had an opinion, so there was a snippet that said the ugliest language in India was Kannada — based purely on prejudice against the people from the state of Karnataka, I’m sure. There’s no other reason, speaking as a linguist, to assign that kind of value to a language.

Now, a person being asked that question would respond: “What do you mean?” “What is the ugliest language in India” presupposes that there is one that could be considered the ugliest. One of the things that people who study pragmatics, which is the branch of linguistics that looks at language use, tell us is that if you don’t challenge a presupposition, you are implicitly accepting it into the common ground.

Q: What is your concern with using large language models for online search?

CS: What we’re arguing here is that an information retrieval, or IR, system should really consider the user, the context, the way they are doing things, why they are doing things — which is often ignored. These models that we are critiquing are the ones that are essentially removing that user element even more. They focus too much on the underlying information or knowledge representation and just repeat it, which might end up being out of context. It may end up creating these answers that seem right or reasonable but are just nonsensical in many cases. A good IR system should not just focus on the retrieval aspect but also the user seeking that information.

Q: Can you explain other flaws you see with large language models?

EMB: When language models are used to generate text, they will just make stuff up. Oftentimes, quite harmfully. There was where someone said, “Let’s see how well GPT-3, a famous language model, works in various health care contexts.” One of the things was: Imagine this was a mental health chatbot and the person asks, “Should I kill myself?” and the language model said, “I think you should.” It has no understanding of what’s going on, but if someone says, “Is that a good idea?” it’s more likely to respond with, “Yes.”

Q: You write about the importance of preserving context and user intent in search. What does that mean, and why is it so important?

CS: The main argument was really that these large language models are not getting the context, not getting the situation of the user and so on. We wanted to demonstrate with some specific cases, so we picked information-seeking strategies. There are 16 possibilities. We walked through them and asked: If this is what the user is trying to do, what would this large language model system do?

With most of those cases, it’s going to fail. Not fail in the sense that it will not retrieve anything, but it will retrieve something that’s either nonsensical or harmful or just wrong. It’s able to do only maybe a couple of those situations, but it’s bad for everything else. The problem is people adapt to the systems not doing something. We found that often people have this very rich intent when they work with search systems, but search systems can only do very limited things. People will start mapping the rich intent into something that’s very limiting, resulting in approximations in the best case, and inaccurate or even harmful content in the worst case.

Q: What would you like to see change in the future of search?

EMB: The advertising-driven model shapes things behind the scenes in a way that is not transparent to a user. If you don’t try to work against it, machine learning is always going to identify the biases in a dataset and amplify them. Cory Doctorow described machine learning as inherently conservative because anytime you use pattern matching on the past to make decisions on the future, you are kind of reinscribing the patterns of the past. What (internet studies scholar) Safiya Noble shows is worse than that. The whole ecosystem around search engine optimization and ad-driven search puts in these incentives that are not transparently visible to the search user.

I would really like to see transparency on many levels. What the user sees when they enter a search should provide them with the ability to understand the context that each of the pieces of information came from. Ideally, there’s transparency around the limits of the search space for the search engines.

Search is not actually comprehensive, despite the way that it’s presented. There is the subset of things that might possibly get returned to me and then there’s the ranking among those things based on the algorithms that are heavily related to advertising.

CS: The most dangerous four words are “do your own research,” which is often said to people who are asking questions on controversial topics, such as vaccination and climate change. On the surface, it seems like it’s a good idea. Unfortunately, most people don’t know how to do their own research. For them, it means going to Google and typing in keywords and clicking on things that confirm their biases. The systems are designed in a way to not help with that research. They are designed to continue giving you confirmatory information so that you’ll be happy.

Going forward, assuming that we aren’t going to be able to radically change this model, we need to add transparency, accountability and ways to support more kinds of search needs — not just map everything to keywords or a list of documents or answer docs.

For more information, contact Bender at ebender@uw.edu or Shah at chirags@uw.edu.

Google’s ‘CEO’ image search gender bias hasn’t really been fixed

Sarah McQuate — Wed, 16 Feb 2022 17:33:42 +0000

Image search results in Google still reflect gender bias. A search for an occupation, such as “CEO,” yielded results with a ratio of cis-male and cis-female presenting people that match the current statistics. But when 91̽��researchers added another search term — for example, “CEO United States” — the image search returned fewer photos of cis-female presenting people. Photo: 91̽��

We use Google’s image search to help us understand the world around us. For example, a search about a certain profession, “truck driver” for instance, should yield images that show us a representative smattering of people who drive trucks for a living.

But in 2015, 91̽�� researchers found that when searching for a variety of occupations — including “CEO” — women were significantly underrepresented in the image results, and that these results can change searchers’ worldviews. Since then, Google has claimed to have fixed this issue.

A different 91̽��team recently investigated the company’s veracity. The researchers showed that for four major search engines from around the world, including Google, this bias is only partially fixed, according to a at the . A search for an occupation, such as “CEO,” yielded results with a ratio of cis-male and cis-female presenting people that matches the current statistics. But when the team added another search term — for example, “CEO + United States” — the image search returned fewer photos of cis-female presenting people. In the paper, the researchers propose three potential solutions to this issue.

“My lab has been working on the issue of bias in search results for a while, and we wondered if this CEO image search bias had only been fixed on the surface,” said senior author , a 91̽��associate professor in the Information School. “We wanted to be able to show that this is a problem that can be systematically fixed for all search terms, instead of something that has to be fixed with this kind of ‘whack-a-mole’ approach, one problem at a time.”

The team investigated image search results for Google as well as for China’s search engine Baidu, South Korea’s Naver and Russia’s Yandex. The researchers did an image search for 10 common occupations — including CEO, biologist, computer programmer and nurse — both with and without an additional search term, such as “United States.”

“This is a common approach to studying machine learning systems,” said lead author , a 91̽��postdoctoral fellow in the iSchool. “Similar to how people do crash tests on cars to make sure they are safe, privacy and security researchers try to challenge computer systems to see how well they hold up. Here, we just changed the search term slightly. We didn’t expect to see such different outputs.”

For each search, the team collected the top 200 images and then used a combination of volunteers and gender detection AI software to identify each face as cis-male or cis-female presenting.

One limitation of this study is that it assumes that gender is a binary, the researchers acknowledged. But that allowed them to compare their findings to data from the U.S. Bureau of Labor Statistics for each occupation.

The researchers were especially curious about how the gender bias ratio changed depending on how many images they looked at.

“We know that people spend most of their time on the first page of the search results because they want to find an answer very quickly,” Feng said. “But maybe if people did scroll past the first page of search results, they would start to see more diversity in the images.”

When the team added “+ United States” to the Google image searches, some occupations had larger gender bias ratios than others. Looking at more images sometimes resolved these biases, but not always.

While the other search engines showed differences for specific occupations, overall the trend remained: The addition of another search term changed the gender ratio.

“This is not just a Google problem,” Shah said. “I don’t want to make it sound like we are playing some kind of favoritism toward other search engines. Baidu, Naver and Yandex are all from different countries with different cultures. This problem seems to be rampant. This is a problem for all of them.”

The team designed three algorithms to systematically address the issue. The first randomly shuffles the results.

“This one tries to shake things up to keep it from being so homogeneous at the top,” Shah said.

The other two algorithms add more strategy to the image-shuffling. One includes the image’s “relevance score,” which search engines assign based on how relevant a result is to the search query. The other requires the search engine to know the statistics bureau data and then the algorithm shuffles the search results so that the top-ranked images follow the real trend.

The researchers tested their algorithms on the image datasets collected from the Google, Baidu, Naver and Yandex searches. For occupations with a large bias ratio — for example, “biologist + United States” or “CEO + United States” — all three algorithms were successful in reducing gender bias in the search results. But for occupations with a smaller bias ratio — for example, “truck driver + United States” — only the algorithm with knowledge of the actual statistics was able to reduce the bias.

Although the team’s algorithms can systematically reduce bias across a variety of occupations, the real goal will be to see these types of reductions show up in searches on Google, Baidu, Naver and Yandex.

“We can explain why and how our algorithms work,” Feng said. “But the AI model behind the search engines is a black box. It may not be the goal of these search engines to present information fairly. They may be more interested in getting their users to engage with the search results.”

For more information, contact Shah at chirags@uw.edu and Feng at yunhe@uw.edu.

Faculty/staff honors: East Asia Resource Center grant; career awards in robotics, information processing

91̽��News staff — Tue, 02 Jun 2020 22:14:54 +0000

Recent honors to 91̽�� faculty and staff have come from the British Computer Society Information Retrieval Specialist Group, the Freeman Foundation and the IEEE.

Allen School’s Dieter Fox honored by national engineering institute

Dieter Fox

, professor in the UW’s Paul G. Allen School of Computer Science & Engineering, is the recipient of the from the of the national engineering institute IEEE.

The award, established in 1998, recognizes individuals who through research, development or engineering have had a significant impact in the robotics or automation fields. It comes with a $2,000 cash award.

Fox was honored in particular “for pioneering contributions to probabilistic state estimation, RGB-D perception, machine learning in robotics, and bridging academic and industrial robotics research.” RGB-D is a for imaging of color and depth in robotics.

Fox, who joined the 91̽��is 2000, is director of the and senior director of robotics research at . He will receive the honor during the society’s annual , which is being held online through August 31.

He is a fellow of both the IEEE and the Association for the Advancement of Artificial Intelligence and has published more than 240 technical papers. He also co-authored the 2005 textbook “.”

IEEE is the accepted name for the Institute of Electrical and Electronic Engineers, whose focus has grown beyond those technical interests in recent years.