Facebook and CMU’s ‘superhuman’ poker AI beats human pros

AI has definitely knocked out humans in our other favorite games. The program, designed by researchers at Facebook's Artificial Intelligence Laboratory and Carnegie Mellon University, is one of the world's top poker players in the six-player Texas Hold'em poker game series, .

An artificial intelligence system called Pluribus confronted 12 pros in two different environments during 12 days and 10,000 hands. One AI played with five human players. In other versions, five versions of AI were played with one player (computer programs were not able to collaborate in this scenario). Pluribus earned an average of $ 5 per hand with a prize of $ 1,000 per hour. This is the "conclusive evidence of victory," the researchers said.

Noam Brown, a researcher at Facebook AI Research and co-founder of Pluribus, said: "We can say that we are at a superhuman level, but that will not change." The Verge . "

" Plu Rivers is a very difficult opponent. It's the 12th consecutive vote against Chris Ferguson,

In a paper in Science scientists behind Pluribus say victory is an important milestone in artificial intelligence research: board games like chess and go, and 19459006 stars In computer games such as Craft II and Dota machine learning has already reached a superhuman level, but six represent unlimited Texas Hold'em (19659008). As well as "incomplete information games") as well as multiple players and complex victory results. AI can have more board combinations than the observable universe of atoms, and AI is having a lot of trouble planning the next move, but all the information is visible, and the game gives the player two possible


Pluribus's training regime schedules. "Limping" is one of the strategies that some human players eventually abandoned by artificial intelligence [19659009] Credits: Facebook

In Texas Hold on 2nd, 2015, machine learning systems surpassed human pros, but the complexity increased dramatically when the number of contacts increased to 5. Brown and his colleague Tuomas Sandholm (CMU professor)

First, Pluribus was taught to a poker game against copies of itself (also referred to as self-Play (self-play)). This is a common technique for artificial intelligence training where the system can learn the game through trial and error. They are aiming at hundreds of thousands of hands against themselves. This course was also very efficient. Pluribus was created in just 8 days using a 64-core server with less than 512 GB of RAM. It costs only $ 150 to train the program on a cloud server, which is cheaper than the $ 10,000 price tag compared to other state-of-the-art systems.

Then, to deal with six additional employees, Brown and Sandholm proposed an efficient way of determining what artificial intelligence would look like in the game, and this mechanism is known as the search function. Pluribus is designed to look ahead to two or three because the opponent can not predict until the end of the game (even a few steps would be a tremendously complex calculation). Brown cut it like this.

Pluribus may think that it is sacrificing a long-term strategy to get short-term profits, but short-term appreciation is indispensable in poker. .

For example, Flurius remarkably excelled in blasting pros and cons of praising "constant consistency". Relatively thin hands. It was an unpredictable prediction. Fantastic quality of poker players.

Brown says this is natural. We often think of bluffing as a unique human trait. Depend on our ability to lie and deceive. But it is still a technology that can be reduced to a mathematically optimal strategy, he says. "Artificial intelligence does not think bullying is deceptive, it's a decision to make the most money in a particular situation," he says. "What we show is that if AI bluffs, it can be more bluff than any other person."

What does artificial intelligence mean to be the most popular poker game in the world? Well, as we have seen in past AI wins, humans can certainly learn from computers. A strategy that players generally suspect (such as "dock bets") implies acceptance from AI and may be more useful than previously thought. "Every time I do a bot, I get something new to integrate into my game," said poker pro Jimmy Chou.

I hope the technology used to make Pluribus can be transferred to other situations. Many scenarios in the real world are most similar to Texas Hold 'em Poker. In other words, it contains multiple players, hidden information and numerous win-win results.

Brown and Sandholm hope that their methods can be applied to areas such as cybersecurity, fraud prevention and financial negotiations.

Can we now consider poker as a "defeat"?

Brown does not directly answer the question, but Pluribus is a static The AI ​​is never updated or upgraded at the end of the first eight days of training, so it can be a better match for the other player's strategy, and for the 12 days spent with the pros, There was nothing to abuse. From the moment it started to gamble, Pluribus was on top.

Leave a Reply

Your email address will not be published. Required fields are marked *