Andy Barto with UMass CICS students and faculty at the inaugural Reinforcement Learning Conference

Pioneering AI Research to Address Complex Challenges in Society

Since the field’s earliest days, UMass Amherst has been on the leading edge of reinforcement learning, a branch of AI which is now poised to disrupt numerous sectors of society.

Andrew Barto arrived at UMass Amherst in 1977 as a post-doctoral student and would go on to literally write the textbook on reinforcement learning. — Andrew Barto first came to UMass Amherst in the late 1970s as a postdoctoral researcher working in a lab studying neural networks.

In 1977, a young postdoctoral researcher came to UMass Amherst to join a lab working on neural networks. Andrew Barto was interested in exploring a new theory that the human brain was driven by billions of nerve cells behaving like hedonists—that is, each trying to maximize pleasure and minimize pain. He was soon joined by a doctoral student, Richard Sutton, and together the two applied this concept to studying the development of artificial intelligence (AI) systems.

“UMass Amherst gave us the opportunity to be free ranging, exploring, and pioneering the field,” Barto recalls. This field, which became known as “reinforcement learning” (RL), is a branch of machine learning and the foundation of many of today’s most impactful and promising AI applications, from chatbots like ChatGPT to medicine to robots for use in the home and commerce. Though this type of "trial-and-error" learning had long been studied as part of animal learning in psychology, efforts to create RL AI systems were still in their infancy when Barto and Sutton began their work.

Barto, today a professor emeritus in UMass’s Manning College of Information & Computer Sciences (CICS), and Sutton, now professor of computer science at the University of Alberta, are credited with developing the conceptual and algorithmic foundations of RL and helping to launch the field. In March 2025, they received the 2024 ACM A.M. Turing Award, often referred to as the “Nobel Prize of Computing.” It is given by the Association for Computing Machinery, the world’s largest educational and scientific computing society, and carries a $1 million prize. [Read more about this award.]

“When we started, it was extremely unfashionable to do what we were doing,” Barto told Axios, which was among numerous news outlets to cover the announcement of the Turing Award. “It had been dismissed, actually, by many people.”

Yet RL has proven to be an essential part of intelligent systems. After eventually overcoming the initial skepticism, it has taken off more quickly than many expected over about the past decade, according to Philip Thomas, associate professor in UMass’s CICS and current co-director of the Autonomous Learning Lab (ALL) founded by Barto.

Andrew Barto and Rich Sutton — From left, Andrew Barto and Richard Sutton.

Today, UMass Amherst remains in the vanguard of RL research, with three core faculty members (all former students of Barto) focused on foundational research to make RL algorithms more reliable, easier to use, and more efficient. Barto’s legacy is apparent throughout the college, with numerous other faculty members conducting research that connects to RL. In August 2024, UMass hosted the inaugural Reinforcement Learning Conference on its campus, with more than 500 participants from around the world—a testament to the university’s predominance in RL. (The photo at the top of the story shows Barto with students and faculty at the conference). Moreover, Barto’s impact is wide-reaching in the broader RL field, with more than a dozen of his former students working in—or leading—top RL research programs at universities around the world.

What Is Reinforcement Learning?

According to Thomas, the field of machine learning—how computers learn from data, or information about the world—has two major branches. The first, known as supervised learning, is used to train computers on problems with known solutions.

“A classic example is recognizing handwritten letters. As humans, we know what symbols are correctly labeled as an ‘A’ or ‘B’ and so on, so we can train a computer to be more likely to produce the correct output by just shifting its decisions towards the correct response,” Thomas explains.

RL, on the other hand, typically tackles more complicated problems using data that doesn’t include the “right” answer.

“These are problems where we don’t know what we should do—we just know how good the outcome is,” says Thomas. For example, his lab is working on research to apply RL to Type 1 diabetes treatment: studying how much insulin to inject to keep a patient’s blood glucose near some target level.

“Having a positive outcome can be viewed as a ‘reward,’ so we’re training programs through trial and error to maximize the amount of reward they get and minimize the penalty, or cost,” he says. “In animals, this is called operant conditioning.”

I’m surprised how long it took people to recognize that RL was something new, and that it was not just an academic curiosity without real-world applications.

Bruno da Silva

Put another way, supervised learning allows AI systems to make predictions while RL enables systems to make optimal decisions to achieve a desired outcome or set of outcomes, explains Bruno Castro da Silva, assistant professor in CICS and co-director of the Autonomous Learning Lab.

Though supervised learning and RL spawned around the same time, supervised learning took off first and became the predominant field. But over the years, RL has proven to be incredibly valuable in tackling many complex problems where the right answer is not yet known.

“I’m surprised how long it took people to recognize that RL was something new, and that it was not just an academic curiosity without real-world applications,” says da Silva.

How Is Reinforcement Learning Used Today?

Though it’s been around since the early 1980s, RL’s real-world applications have only begun emerging over about the past decade. “More and more, we’re seeing RL actually be applied in the real world,” says Thomas.

Today, RL plays a vital role in training algorithms for numerous applications, some of which are in use today while others are still in development. These applications include:

Refining the responses of chatbots, like ChatGPT, to improve tone and quality for interactions with humans;
Treating medical conditions, such as diabetes and sepsis;
Driving autonomous vehicles;
Controlling prosthetic limbs;
Operating water treatment plants;
Controlling a tokamak nuclear reactor;
Shedding new light on the dopamine system in the human brain;
Guiding business decisions, such as whether a bank should issue a loan to a customer;
Targeting advertisements on the web;
And recommending content to users on platforms such as YouTube, Spotify, and Netflix.

In the near future, da Silva predicts, “I think companies are going to realize that this technology is really profitable: that they can use it to make any type of decision now made by humans.”

And while today, large robots operate in factories doing simple, repetitive tasks, da Silva says RL will soon open the door for small robots to carry out many more tasks with greater adaptability, both in commerce and in the home.

“I think gradually robots are going to replace employees in many jobs, with huge implications for society. This will save companies money, but many people are probably going to lose their jobs,” da Silva adds. “At the same time, this is going to create opportunities for other new types of jobs.”

But Is It Safe?

From left, Andrew Barto, Philip Thomas, and Scott Niekum — Andrew Barto (second from left) gathered with his former grad students, who are now professors: George Konidaris, associate professor of computer science at Brown University, and Philip Thomas and Scott Niekum, both UMass Amherst associate professors of computer science.

Ensuring safety and fairness is crucial when deploying AI in the real world, especially in high-stakes contexts like making decisions (or supporting decisions made by humans) related to health care, hiring, lending, or criminal justice. But historically, algorithms have often failed in this regard, says Thomas. For example, criminal justice systems in several states used a computer algorithm to predict the risk of a defendant re-offending again, when making sentencing decisions. But this algorithm was found to be biased against racial minorities. As a ProPublica analysis found, the formula was almost twice as likely to falsely predict that Black defendants would re-offend compared to white defendants, while white defendants were mislabeled as low risk for re-offending more often than Black defendants.

In UMass’s Autonomous Learning Lab, researchers are studying ways to make systems behave better to meet the expectations of their users. In a paper published in Science in November 2019, “Preventing Undesirable Behavior of Intelligent Machines,” they proposed a new framework that would shift the burden of ensuring an algorithm (trained with either supervised or reinforcement learning) is well behaved from the user to the machine learning researchers who design it. The paper was authored by Thomas, da Silva, Barto, Yuriy Brun, Stephen Giguere, and Emma Brunskill.

We see the potential of RL to solve so many of the challenges our society is facing—improving medical treatments, increasing the efficiency of all our systems, and advancing fairness.

Philip Thomas

Under this framework, it is incumbent on the algorithm designer to understand the user’s definition of undesirable behavior and what might cause it, and to avoid that behavior with high probability (confidence). The researchers named algorithms designed using this framework “Seldonian” algorithms after a character in Isaac Asimov’s science fiction novels. They demonstrated that they were, for example, able to simulate the safe use of RL for type 1 diabetes treatment using a Seldonian algorithm. They also have developed Seldonian algorithms that ensure fairness for both supervised learning and RL, with applications ranging from using AI to automatically improve online courses to ensuring the fairness of systems that influence loan approvals.

The researchers hope that Seldonian algorithms will not only make current real-world RL applications safer and fairer but will also open up new applications for which the use of machine learning was previously deemed too risky.

“We see the potential of RL to solve so many of the challenges our society is facing—improving medical treatments, increasing the efficiency of all our systems, and advancing fairness,” says Thomas. “All this speaks to the Manning College of Information & Computer Sciences’ commitment to computing for the common good.”

Powering Progress in Robots

With RL playing a key role in advancing robotics—ranging from manipulation and locomotion to natural language control of robots—UMass researchers are working to ensure their safety. Scott Niekum, an associate professor in CICS who also studied with Barto as a graduate student at UMass, today runs the Safe, Confident, and Aligned Learning + Robotics (SCALAR) lab at UMass. Research in the lab aims to enable robots and other learning agents to be deployed in the real world with minimal expert intervention.

In 2023, Niekum was among 22 tech leaders who signed a statement about the potential threat AI poses to society, reading, simply, "Mitigating the risk of extinction from AI should be a global priority alongside other societal scale risks such as pandemics and nuclear war."

As he explained on NPR’s Morning Edition at the time, “We don't really know how to accurately communicate to AI systems what we want them to do. So imagine I want to teach a robot how to jump. So I say, ‘Hey, I'm going to give you a reward for every inch you get off the ground.’ Maybe the robot decides just to go grab a ladder and climb up it and it's accomplished the goal I set out for it. But in a way that's very different from what I wanted it to do. And that maybe has side effects on the world. Maybe it's scratched something with the ladder. Maybe I didn't want it touching the ladder in the first place.

“And if you swap out a ladder and a robot for self-driving cars or AI weapon systems or other things, that may take our statements very literally and do things very different from what we wanted.”

Given the importance of this problem, Niekum and the SCALAR research group have developed a wide range of novel and effective methods to ensure that machine learning systems remain safe and aligned with human objectives.

Above, Andrew Barto poses for a photo with CICS students and faculty at an event celebrating the announcement of the Turing Award on March 5, 2025.

Designing RL for the Real World

Beyond ensuring fairness and safety, UMass researchers are working to refine RL technologies in other ways to make them more useful in the real world.

While scientists can currently train AI systems to effectively solve narrowly defined problems in specific situations, these systems are less adept at adapting to small changes. For example, an autonomous vehicle may be able to drive well under normal circumstances but may struggle if, for example, it encounters rain.

“We’re studying how to teach AI systems to transfer their knowledge to adapt to new situations or to address variants of a task, so they can solve new problems without having to learn from scratch,” says da Silva.

The ALL researchers are also studying how AI systems can simultaneously optimize several different objectives, which may be in conflict with one another, as is often required in real-world applications. For example, a user of an autonomous vehicle may want to reach their destination as quickly as possible but also to have a smooth ride, with minimal swerving and sudden breaking. These priorities may change depending on the type of trip—for example, driving to the airport to catch a flight versus a leisurely day out with friends. While traditional RL algorithms would rate the combined performance on all objectives with one overall “reward” number for the system to optimize, “multi-objective reinforcement learning” considers the different objectives separately, and allows the system to quickly adapt to new user preferences and priorities.

Preparing for an AI Future

As AI becomes ever more prevalent in the world, UMass is preparing its students through coursework on the fundamentals of machine learning, as well as its application in many diverse areas. Several undergraduates have also completed their honors theses in the Autonomous Learning Lab, while graduate students play an important role in the lab’s research.

“UMass has been a leader in AI for the past 30 years, including training the next generation of machine learning researchers and practitioners,” says da Silva, who teaches courses on machine learning for undergraduate, master’s, and PhD students. “We offer a wide range of courses that allow students to explore numerous different directions they may want to go in the future, from data science to mobile health sensing systems to computer vision in robots.”

Andrew Barto at UMass Amherst in 1982. — Andrew Barto at UMass Amherst in 1982

Moreover, UMass is unique in balancing the hands-on application of machine learning with robust education on the foundations of RL.

“It’s really important for students to understand the theory and to think deeply about how and why these things work, so they can identify gaps in the literature and challenge assumptions in the field,” says da Silva. “We want students to carry forth the legacy of Andy [Barto] and Rich [Sutton] to not only make incremental improvements, but to find qualitatively novel approaches to AI.”

Beyond Barto’s immense contributions to the scientific understanding of RL, he is credited with helping to form the truly collaborative research community that exists today in the field.

“Andy is extremely humble, kind, and welcoming, always giving credit to others,” says da Silva. “He has really set the tone for this field.”

This story was originally published in March 2025.

Go Deeper

Autonomous Learning Lab at UMass Amherst

UMass Amherst’s Manning College of Information & Computer Sciences

AI at UMass