Software That Beat World’s Best Human ‘Go’ Player Has Roots in UMass Amherst Computer Science

Go, ancient board game

AMHERST, Mass. – The ancient board game Go, played with black and white stones on a grid, is so complex that experts in artificial intelligence (AI) once believed it would be decades, if ever, before a computer could beat the best human player. But today AlphaGo, an AI program developed by Google DeepMind, has defeated the human world champion Lee Se-dol, winning the first three of five matches this week in a tournament in Seoul. The human won the fourth game, and the last will be played at 1 p.m. local time in South Korea today.

A small group of computer scientists and AI experts at the University of Massachusetts Amherst whose early work in machine learning helped to shape AlphaGo’s approach are enjoying the machine’s victory.

Sridhar Mahadevan, co-director of the Autonomous Learning Laboratory at the College of Information and Computer Sciences at UMass Amherst with professor emeritus Andrew Barto, says, “This is a huge breakthrough for AI and machine learning. AlphaGo was not programmed explicitly to play Go, but learned to play using an approach called reinforcement learning, or RL, pioneered here at UMass Amherst several decades ago by Andy Barto, considered the father of RL, and his students, principally Rich Sutton. Many of the top researchers in the field today are his students. So this achievement was made possible by research done at UMass Amherst computer science, and it’s a source of special pride for us all.”

Barto and Sutton’s book, “Reinforcement Learning,” considered the bible in the field, is cited in DeepMind’s January paper in Nature outlining AlphaGo’s strategy.   

As Mahadevan explains, the standard approach to machine learning, also known as data mining, applied to strategy games is to learn from a large database of moves played by human experts. It works by teaching a machine to quickly search its database for the best move it can make for every position on the board. “But as the game gets more complicated, it doesn’t work,” he says. “It has a ceiling; the calculations get so numerous that this “human supervised” approach to machine learning has never gotten a machine to play games at the championship level. It turns out that in order to get to the highest level, the machine crucially has to also play against itself and learn how to win by trial and error, or reinforcement learning.”

Barto adds, “Many people didn’t think this was possible, but here we are. It’s a truly remarkable development. These results with AlphaGo are really historic and the DeepMind team deserves an enormous amount of credit for their accomplishment.”

He says, “Reinforcement learning, which is inspired by the trial-and-error learning that humans and other animals do, goes way back in the artificial intelligence community. But it had been mostly dropped in favor of other approaches. People thought it was too weak. It was quite a backwater when I came to UMass in 1977, but I came to think that it had been prematurely dismissed.”

 Barto adds, “I’m certainly not the real father, because the basic idea goes back at least to the 19th century in psychology. The first computer science mention of RL I have found was by Alan Turing in 1948, but he never implemented it. The AI pioneer Marvin Minsky at MIT, who recently passed away, contributed some of the basic computational ideas. It sporadically arose in engineering and AI but had been considered too slow, requiring too much data. But now we have the computing power, plus deep learning neural networks, that when put together can tackle huge problems. I’m delighted to see that UMass Amherst played a role in this accomplishment.”

Something called the search space for a game, Barto explains, contains all possible games that could be played, and for Go it is astronomically huge, many times larger than for chess, for example. There is a “combinatorial explosion,” he notes, because games can take a lot of plays and so many moves are possible at each play.

“But interestingly that’s not the main problem,” he says. “The real problem, the experts tell me, is coming up with a good evaluation function. That is a way of looking at a board configuration and assigning it a value for how good it would be in terms of winning the game. It’s a prediction of your chances of winning from that configuration, and it has to be quickly computable.”

“Short of planning moves by doing a lot of lookahead searches, you need a shortcut to predict, a very quick assessment without doing all that lookahead. That’s what AlphaGo has come up with and what many people didn’t think was possible. It turns out that a good evaluation function can circumvent the combinatorial explosion,” Barto adds.  

Mahadevan says the lab that he and Barto co-direct has graduated about 35 Ph.D. students over the past 30 years, about 20 of whom worked in reinforcement learning. At present the program has about 12 Ph.D. students and continues to be one of the world’s leading laboratories in the field.

Barto and Sutton are now revising their 1998 textbook, which has been cited more than 20,000 times, and will include a description of AlphaGo’s success as a case study.