Triumph over five human opponents at Texas hold’em brings bots closer to solving complicated real-world problems.
Machines have raised the stakes once again. A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker, the most popular variant of the game. It is the first time that an artificial-intelligence (AI) program has beaten elite human players at a game with more than two players1.
“While going from two to six players might seem incremental, it’s actually a big deal,” says Julian Togelius at New York University, who studies games and AI. “The multiplayer aspect is something that is not present at all in other games that are currently studied.”
The team behind Pluribus had already built an AI, called Libratus, that had beaten professionals at two-player poker. It built Pluribus by updating Libratus and created a bot that needs much less computing power to play matches. In a 12-day session with more than 10,000 hands, it beat 15 top human players. “A lot of AI researchers didn’t think it was possible to do this” with our techniques, says Noam Brown at Carnegie Mellon University in Pittsburgh, Philadelphia, and Facebook AI Research in New York, who developed Pluribus with his Carnegie colleague Tuomas Sandholm.
Other AIs that have mastered human games — such as Libratus and DeepMind’s Go-playing bots — have shown that they are unbeatable in two-player zero-sum matches. In these scenarios, there is always one winner and one loser, and game theory offers a well-defined best strategy.
But game theory is less helpful for scenarios involving multiple parties with competing interests and no clear win–lose conditions — which reflect most real-life challenges. By solving multiplayer poker, Pluribus lays the foundation for future AIs to tackle complex problems of this sort, says Brown. He thinks that their success is a step towards applications such as automated negotiations, better fraud detection and self-driving cars.
Extra complex
To tackle six-player poker, Brown and Sandholm radically overhauled Libratus’s search algorithm. Most game-playing AIs search forwards through decision trees for the best move to make in a given situation. Libratus searched to the end of a game before choosing an action.
But the complexity introduced by extra players makes this tactic impractical. Poker requires reasoning with hidden information — players must work out a strategy by considering what cards their opponents might have and what opponents might guess about their hand based on previous betting. But more players makes choosing an action at any given moment more difficult, because it involves assessing a larger number of possibilities.
The key breakthrough was developing a method that allowed Pluribus to make good choices after looking ahead only a few moves rather than to the end of the game.
Pluribus teaches itself from scratch using a form of reinforcement learning similar to that used by DeepMind’s Go AI, AlphaZero. It starts off playing poker randomly and improves as it works out which actions win more money. After each hand, it looks back at how it played and checks whether it would have made more money with different actions, such as raising rather than sticking to a bet. If the alternatives lead to better outcomes, it will be more likely to choose theme in future.
By playing trillions of hands of poker against itself, Pluribus created a basic strategy that it draws on in matches. At each decision point, it compares the state of the game with its blueprint and searches a few moves ahead to see how the action played out. It then decides whether it can improve on it. And because it taught itself to play without human input, the AI settled on a few strategies that human players tend not to use.
AI playpen
Pluribus’s success is largely down to its efficiency. When playing, it runs on just two central processing units (CPUs). By contrast, DeepMind’s original Go bot used nearly 2,000 CPUs, and Libratus 100 CPUs, when they first beat top professionals. When playing against itself, Pluribus plays a hand in around 20 seconds — roughly twice as fast as professional humans.
Games have proved a great way to measure progress in AI because bots can be scored against top humans — and objectively be hailed as superhuman if they triumph. But Brown thinks that AIs are outgrowing their playpen. “This was the last remaining challenge in poker,” he says.
But Togelius thinks there is mileage yet for AI researchers and games. “There’s a lot of unexplored territory,” he says. Few AIs have mastered more than one game, which requires general ability rather than a niche skill. And there’s more than simply playing games, says Togelius. “There’s also designing them. A great AI challenge if there ever was one.”
(원문: 여기를 클릭하세요~)