Ashis Banerjee – 91̽��News

Q&A: Helping robots identify objects in cluttered spaces

Sarah McQuate — Wed, 07 Feb 2024 18:12:44 +0000

Researchers at the 91̽�� have developed a method that teaches a low-cost robot to identify objects on a cluttered shelf. For the test, the robot (shown here in the center of the photo) was asked to identify all objects on the shelf in front of it. Photo: Samani and Banerjee/IEEE Transactions on Robotics

Imagine a coffee cup sitting on a table. Now, imagine a book partially obscuring the cup. As humans, we still know what the coffee cup is even though we can’t see all of it. But a robot might be confused.

Robots in warehouses and even around our houses struggle to identify and pick up objects if they are too close together, or if a space is cluttered. This is because robots lack what psychologists call “object unity,” or our ability to identify things even when we can’t see all of them.

Researchers at the 91̽�� have developed a way to teach robots this skill. The method, called THOR for short, allowed a low-cost robot to identify objects — including a mustard bottle, a Pringles can and a tennis ball — on a cluttered shelf. In published in IEEE Transactions on Robotics, the team demonstrated that THOR outperformed current state-of-the-art models.

91̽��News reached out to senior author , 91̽��associate professor in both the industrial & systems engineering and mechanical engineering departments, for details about how robots identify objects and how THOR works.

Ashis Banerjee Photo: 91̽��

How do robots sense their surroundings?

Ashis Banerjee: We sense the world around us using vision, sound, smell, taste and touch. Robots sense their surroundings using one or more types of sensors. Robots “see” things using either standard color cameras or more complex stereo or depth cameras. While standard cameras simply record colored and textured images of the surroundings, stereo and depth cameras also provide information on how far away the objects are, just like our eyes do.

On their own, however, the sensors cannot enable the robots to make “sense” of their surroundings. Robots need a visual perception system, similar to the visual cortex of the human brain, to process images and detect where all the objects are, estimate their orientations, identify what the objects might be and parse any text written on them.

Why is it hard for robots to identify objects in cluttered spaces?

AB: There are two main challenges here. First, there are likely a large number of objects of varying shapes and sizes. This makes it difficult for the robot’s perception system to distinguish between the different object types. Second, when several objects are located close to each other, they obstruct the views of other objects. Robots have trouble recognizing objects when they don’t have a full view of the object.

Are there any types of objects that are especially hard to identify in cluttered spaces?

AB: A lot of that depends on what objects are present. For example, it is challenging to recognize smaller objects if there are a variety of sizes present. It is also more challenging to differentiate between objects with similar or identical shapes, such as different kinds of balls, or boxes. Additional challenges occur with soft or squishy objects that can change shape as the robot collects images from different vantage points in the room.

So how does THOR work and why is it better than previous attempts to solve this problem?

AB: THOR is really the brainchild of lead author , who completed this research as a 91̽��doctoral student. The core of THOR is that it allows the robot to mimic how we as humans know that partially visible objects aren’t broken or entirely new objects.

THOR does this by using the shape of objects in a scene to create a 3D representation of each object. From there it uses topology, an area of mathematics that studies the connectivity between different parts of objects, to assign each object to a “most likely” object class. It does this by comparing its 3D representation to a library of stored representations.

Check out a .

THOR does not rely on training machine learning models with images of cluttered rooms. It just needs images of each of the different objects by themselves. THOR does not require the robot to have specialized and expensive sensors or processors, and it also works well with commodity cameras.

This means that THOR is very easy to build, and is, more importantly, readily useful for completely new spaces with diverse backgrounds, lighting conditions, object arrangements and degree of clutter. It also works better than the existing 3D shape-based recognition methods because its 3D representation of the objects is more detailed, which helps identify the objects in real time.

How could THOR be used?

AB: THOR could be used with any indoor service robot, regardless of whether the robot operates in someone’s home, an office, a store, a warehouse facility or a manufacturing plant. In fact, our experimental evaluation shows that THOR is equally effective for warehouse, lounge and family room-type spaces.

While THOR performs significantly better than the other existing methods for all kinds of objects in these cluttered spaces, it does the best at identifying kitchen-style objects, such as a mug or a pitcher, that typically have distinctive but regular shapes and moderate size variations.

Green boxes shown here surround the objects that the robot correctly identified. Red boxes surround incorrectly identified items. Photo: Samani and Banerjee/IEEE Transactions on Robotics

What’s next?

There are several additional problems that need to be addressed, and we are working on some of them. For example, right now, THOR considers only the shape of the objects, but future versions could also pay attention to other aspects of appearance, such as color, texture or text labels. It is also worth looking into how THOR could be used to deal with squishy or damaged objects, which have shapes that are different from their expected configurations.

Also, some spaces may be so cluttered that certain objects might not be visible at all. In these scenarios, a robot needs to be able to decide to move around to “see” the objects better, or, if allowed, move around some of the objects to get better views of the obstructed objects.

Last but not least, the robot needs to be able to deal with objects it hasn’t seen before. In these scenarios, the robot should be able to place these objects into a “miscellaneous” or “unknown” object category, and then seek help from a human to correctly identify these objects.

This research was funded in part by an Amazon Research Award.

For more information, contact Banerjee at ashisb@uw.edu.

How ergonomic is your warehouse job? Soon, an app might be able to tell you

Sarah McQuate — Mon, 19 Aug 2019 15:54:31 +0000

91̽��researchers have used deep learning to develop a new system that can monitor factory or warehouse workers and tell them how risky their behaviors are in real time.

In 2017 there were nearly 350,000 incidents of workers taking sick leave due to injuries affecting muscles, nerves, ligaments or tendons — like carpal tunnel syndrome — according to the . Among the workers with the highest number of incidents: people who work in factories and warehouses.

For journalists

Musculoskeletal disorders happen at work when people use awkward postures or perform repeated tasks. These behaviors generate strain on the body over time. So it’s important to point out and minimize risky behaviors to keep workers healthy on the job.

Researchers at the 91̽�� have used machine learning to develop a new system that can monitor factory and warehouse workers and tell them how risky their behaviors are in real time. The algorithm divides up a series of activities — such as lifting a box off a high shelf, carrying it to a table and setting it down — into individual actions and then calculates a risk score associated with each action.

The team June 26 in IEEE Robotics and Automation Letters and will present the findings Aug. 23 at the in Vancouver, British Columbia.

“Right now workers can do a where they fill out their daily tasks on a table to estimate how risky their activities are,” said senior author , an assistant professor in both the industrial & systems engineering and mechanical engineering departments at the UW. “But that’s time consuming, and it’s hard for people to see how it’s directly benefiting them. Now we have made this whole process fully automated. Our plan is to put it in a smartphone app so that workers can even monitor themselves and get immediate feedback.”

For these self-assessments, people currently use a snapshot of a task being performed. The position of each joint gets a score, and the sum of all the scores determines how risky that pose is. But workers usually perform a series of motions for a specific task, and the researchers wanted their algorithm to be able to compute an overall score for the entire action.

Moving to video is more accurate, but it requires a new way to add up the scores. To train and test the algorithm, containing 20 three-minute videos of people doing 17 activities that are common in warehouses or factories.

To train and test the algorithm, the team created a dataset containing 20 three-minute videos of people doing 17 activities that are common in warehouses or factories. Photo: 91̽��

“One of the tasks we had people do was pick up a box from a rack and place it on a table,” said first author , a 91̽��mechanical engineering doctoral student. “We wanted to capture different scenarios, so sometimes they would have to stretch their arms, twist their bodies or bend to pick something up.”

The researchers captured their dataset using a Microsoft Kinect camera, which recorded 3D videos that allowed them to map out what was happening to the participants’ joints during each task.

Using the Kinect data, the algorithm first learned to compute risk scores for each video frame. Then it progressed to identifying when a task started and ended so that it could calculate a risk score for an entire action.

The algorithm labeled three actions in the dataset as risky behaviors: picking up a box from a high shelf, and placing either a box or a rod onto a high shelf.

Now the team is developing an app that factory workers and supervisors can use to monitor in real time the risks of their daily actions. The app will provide warnings for moderately risky actions and alerts for high-risk actions.

Eventually the researchers want robots in warehouses or factories to be able to use the algorithm to help keep workers healthy. To see how well the algorithm could work in a hypothetical warehouse, the researchers had a robot monitor two participants performing the same activities. Within three seconds of the end of each activity, the robot showed a score on its display.

The researchers had a robot (white arm) monitor participants performing activities in a warehouse-like setting. At the end of each activity, the robot showed a score on its display (right). Photo: Parsa et al./IEEE Robotics and Automation Letters

“Factories and warehouses have used automation for several decades. Now that people are starting to work in settings where robots are used, we have a unique opportunity to split up the work so that the robots are doing the risky jobs,” Banerjee said. “Robots and humans could have an active collaboration, where a robot can say, ‘I see that you are picking up these heavy objects from the top shelf and I think you may be doing that a lot of times. Let me help you.'”

Additional co-authors are , and , who are 91̽��mechanical engineering doctoral students; , who completed this research as a summer intern at the UW; and , a professor in the 91̽��mechanical engineering department. Funding and support for this project has been provided by the State of Washington, Department of Labor and Industries, Safety and Health Investment Projects. This research was also funded by a gift from Amazon Robotics.

###

For more information, contact Banerjee at ashisb@uw.edu and Parsa at behnoosh@uw.edu.