A.I. Masters Six-Player Texas Hold'em, a White Whale for A.I. Researchers

A.I. is outgrowing perfect information games. That's a big deal. 

In a landmark for artificial intelligence research, an A.I. has been developed that is capable of beating professionals at no limit in six-player Texas Hold’em poker. It’s the first time an A.I. has been able to master a game of that caliber, and the first time an A.I. has been able to beat professionals in a game that has more than two players or two teams, according to the researchers.

The new artificial intelligence system was developed by researchers at Facebook and Carnegie Mellon. As detailed in a new Science paper, the new system, Pluribus, was able to beat professionals — all of whom had won at least $1 million, and some as much as $10 million — in a one-A.I., five-person scenario.

While Pluribus was playing for chips, not cash, if each chip were worth $1, it would have won an average of $5 per hand, or about $1,000 per hour.

Not bad.

Of course, the ramifications could spread far beyond the poker circuit. After all, there’s a reason why A.I. researchers are always trying to teach computers how to play board games. Board games are a surrogate for teaching A.I. how to solve real-world problems, offering researchers a roadmap toward A.I. systems that can not only automate, but can interpret nuance, adapt to new information, and think strategically. And in this regard, Pluribus is a big leap forward.

After all, most real-world scenarios where you might want the help of an A.I. involve more than one person.

Related Video: Researchers behind the Pluribus explain the real-world applications of a previous version of their A.I. called Liberatus.

How Pluribus Mastered Poker

For decades, A.I. has gradually bested humans in Chess, Go, and even intricate computer games like Dota. But these games all share a limitation that make them comparably easy for computers to master. Not only are they all one-on-one; but they’re all so-called perfect information games, where the computer has all the information it needs to play logically, including its opponents’ motivations. Computers needed all that information in order to beat humans.

Because it’s filled with bluffing, hidden motivations, and conflicting signals, Texas Hold’em has long been a white whale for researchers. Perhaps a player wanted to lose that hand, to make an opponent over-confident. Perhaps they’re betting big right now because, actually, their hand totally sucks. In 2017, the researchers behind Pluribus launched an earlier version of the system, called Liberatus, which could beat professionals in a one-on-one setting using a three-pronged algorithm that was powered by a supercomputer. (Another sophisticated poker system called DeepStack defeated professionals in head-up no-limits poker, in 2016.)

Liberatus’ first module creates a simple abstraction of the game, using self-play to create a strategy that works for a no-limit match’s early stages. Its second module, called a subgame solver, kicks in later, and creates a new strategy for each hand that matches the greater blueprint. Finally, a third module kicks in, called a self-improver, which uses an opponent’s actual moves to make strategic guesses about which direction the game is moving in to prioritize which parts of the “game tree” are worth filling in.

Even in light of its limitations, the Pentagon paid $10 million to use Liberatus to improve its war games training units, reported Business Chief.

A new A.I. was able to beat five human opponents in no-limits poker, a sign that A.I. systems can master problem-solving with imperfect information. 

Liberatus’ approach, however, couldn’t be scaled up to six players, because a six player match is exponentially more complicated. So the team developed a new system, one which uses a single blueprint strategy for the entire game, and a search algorithm that allows Pluribus to update its strategy in real time. Pluribus taught itself the blueprint strategy using a modified version of a self-play algorithm called Monte Carlo Counterfactual Regret Minimization.

In this system, one player in the game is designated as a “traverser” who constantly updates its strategy as the game goes on. After the first hand, assigned at random, an algorithm then evaluates each decision the traverser made against all their other possible decisions.

“The bot went from being a beatable mediocre player to competing with the best players in the world in a few weeks,” said Darren Elias, a professional poker player who helped train the algorithm, in a statement. “Its major strength is its ability to use mixed strategies. That’s the same thing that humans try to do. It’s a matter of execution for humans — to do this in a perfectly random way and to do it so consistently. Most people just can’t.”

Researchers note that the strategy they used to train Pluribus needn’t be confined to poker; it might be able to create strategies in real-world situations where there are multiple people but limited ability to collude, for example auctions, traffic jams, and finance.