AlphaStar superhuman speed implemented as a patch for simulation training errors?

Probably, everyone has already heard that the AI called Google's Deepmind's AlphaStar from Google has smeared the professionals in the real-time strategy of Starcraft 2 . This is an unprecedented case in the research of Artificial Intelligence. But I want to express constructive criticism about this achievement.

I will try to convincingly prove the following:

AlphaStar played with superhuman speed and precision.
Deepmind claims to have prohibited AI from performing actions that are physically impossible for a person. The developers did not succeed in this and probably know about their joint.
The reason AlphaStar plays at superhuman speeds is most likely due to its inability to get rid of the acquired skill of spam clicks. I suspect that the developers wanted to make the program more humane, but they could not. It will take time to come to this thesis. But this is the main reason why I wrote the article, so please be patient.

First of all, I want to clarify that I am a layman. For many years I have been following the development of AI and the scene of Starcraft 2, but I do not pretend to be an expert. If you notice any misconceptions, please point them out. I'm just a fan and it's all incredibly exciting for me. The article has a lot of speculation, and I admit that I can not definitively prove the main claims. With all the reservations, if you read the article and do not agree with me, please argue constructively. I really want you to dissuade me.

In the end, AlphaStar is an amazing achievement. In my opinion, the greatest achievement of Deepmind to date, and I look forward to, how else will improve this program. Thank you for your patience. So let's go.

Superhuman speed alphastar

David Silver, co-director of the AlphaStar team: "AlphaStar cannot respond faster and cannot make more clicks than a live player."

Here the lead AI designer makes an important statement (from 1:39)

In 2018, Serral dominated the Starcraft 2 scene. He is the current world champion and won seven of the nine major tournaments in which he participated, which led to one of the most powerful examples of the dominance of one player in the history of Starcraft 2. The guy is very fast. Perhaps the fastest in the world.

First person view (from 1:00 pm):

Take a look at his APM in the upper left. This is a reduction for the number of actions per minute. In fact, this number reflects how quickly the player presses the mouse and keyboard buttons. Never once Serral can hold APM for more than 500 for a long time. There is one surge to APM 800, but only for a split second and, most likely, as a result of spam clicks, which I will talk about soon.

So, the fastest player in the world is able to maintain an impressive level of APM 500, but the AlphaStar had bursts of up to 1500+. These inhuman indicators of more than APM 1000 sometimes lasted five seconds and are full of meaningful action. 1500 actions per minute is 25 actions per second. It is physically impossible for a person. Also, please note that in Starcraft, five seconds is a long time, especially at the very beginning of a big battle. If the superhuman figure in the first five seconds gives an AI an advantage, then he will easily win the battle due to the snowball effect. Here is the start of the AlphaStar battle in the third game against MaNa (from 59:30):

AlphaStar holds APM 1000+ for five seconds. Another start in the fourth game with transcendental APM 1500+ (c 2:11:32):

One commentator points to an acceptable average APM. But it is abundantly clear that these surges are much higher than human abilities.

Spam clicks, APM and surgical precision robots

Most players are prone to spam clicks. Senseless clicks that do not affect anything. For example, a person moves an army and for some reason clicks several times at a destination. What effect? No The army will not go faster. One click was enough. Then why does he do it? There are two reasons:

Spam click is a natural side effect when a person tries to click as quickly as possible.
Helps to warm up the fingers.

Remember Serral'a? Its impressive power is not really in the speed of clicks, but in accuracy. He not only has a really high APM, but is also extremely effective (total clicks per minute, except for spam clicks). From now on, I will be cutting effective APM as EPM. It is important to remember that EPM takes into account only meaningful actions.

See how a former professional lost his mind on Twitter after learning EPM Serral:

Its EPM 344 is almost unreal. He is so tall that it is still difficult for me to believe that this is true. The difference between APM and EPM also influenced AlphaStar. If an AI can play without spam clicks, does this mean that its peak EPM is at times equal to peak APM? This makes bursts of up to 1000+ even more inhuman. When we take into account that the AlphaStar plays with perfect precision, its mechanical capabilities seem completely absurd. He always clicks exactly where he wants to press. People miss the mark, and AlphaStar starts four times faster at the right time than the fastest player in the world - with an accuracy that a person can only dream about.

Virtually everyone in the community agrees that AlphaStar performed sequences that no one person can repeat. He was faster and more accurate than physically possible. The fastest professional in the world acts several times slower. Accuracy cannot even be compared.

David Silver’s assertion that AlphaStar can only perform actions that a person is capable of reproducing is simply not true.

Do everything right or just turn on the speed?

Oriol Vignals, lead architect AlphaStar: “It is important to master games that are recognized as“ fundamental challenges for AI. ” We are trying to create intelligent systems that take over our amazing capabilities, so it is very important that they learn as much as possible “humanly”. No matter how cool it sounds, the achievement of maximum performance in the game, like very high APM, in fact does not help us measure the capabilities and progress of our agents, which makes the benchmark useless. ”

Why does Deepmind want to limit the agent to play as a person? Why not just let it go without any limits? The reason is that in Starcraft 2 mechanical superpowers spoil the gameplay. In this video, a bot several Zergling attacks a group of tanks, realizing the perfect micro tactics. Usually, zerglings can do almost nothing against tanks, but thanks to robots micro tactics become much more deadly: they destroy tanks with minimal losses. With such good management of AI units, there is no need to learn strategy. Deepmind is not interested in creating AI, which simply defeats Starcraft professionals, in fact they want to use this project as a stepping stone in advancing general AI research. It is very sad that one of the project managers declares restrictions on a par with human abilities, when the agent clearly violates them and wins his games precisely because of superhuman execution.

AlphaStar is superior to people in the management of units - this factor was not taken into account when the developers carefully balanced the game. This inhuman control can ruin any strategic thinking that the AI has mastered. It may even make strategic thinking completely unnecessary. The program is not just stuck at a local maximum. If the game is played with inhuman speed and accuracy, then the abuse of the ideal control of units is likely to be the best, most effective and reliable way to win. As sad as it sounds.

Here is what one of the professionals said about the strengths and weaknesses of AlphaStar, losing to him with a score of 1-5:

MaNa: “I would say that his best quality is unit management. In all games with approximately the same number of units, AlphaStar won. The worst aspect of the outcome of a small number of games is a stubborn refusal to upgrade. He was so convinced of the victory by the base units that he practically didn’t upgrade anything, for which he eventually paid in the exhibition match [last game with MaNa, where the AI lost - approx. trans.]. There were not so many decisive moments in decision making, so I would say that the reason for the victory was the mechanics. ”

Among Starcraft fans, there is an almost unanimous opinion that AlphaStar won almost exclusively because of its superhuman speed, reaction time and accuracy. Pros who played against him seem to agree with that. One of the employees of Deepmind played against AlphaStar before the program was set against professionals. Most likely, he also agrees with this assessment. David Silver and Oriol Vinyals repeat the mantra that the AlphaStar is capable of doing only what a person does, but we have already seen that this is simply not the case.

AlphaStar doesn't seem to "do everything right," as David says (from 1:38):

Something is clearly wrong here.

Why did Deepmind allow AlphaStar superhuman speed?

Finally, let's get to the main point. Thank you for reading this place. But first, let's summarize.

We know what APM, EPM and spam clicks are.
We have some understanding of maximum human capabilities.
The game AlphaStar directly contradicts the claims of the developers about its limitations.
The Starcraft 2 community agreed that AlphaStar won thanks to the inhuman control of units and didn’t even need excellent strategic thinking.
Deepmind does not set the task to create a quick bot, so he should not have played like that.
It is very unlikely that no one from the Starcraft AI team thought that a person is not able to repeat the APM 1500+ bursts. Their Starcraft specialist should know more about Starcraft than mine. They work closely with Blizzard, which owns intellectual property on StarCraft. It is in their interests (see the previous paragraph, as well as the statements of Silver and Vinhals) to make the bot act as close as possible to the person.

Considering all these points, why did Deepmind even allow the AI to bypass the limitations of the human body?

This is pure speculation on my part and I do not claim that I know the exact story. But I suspect that the following has happened:

At the very beginning of the project, Deepmind agreed on hard limits. At this point, AlphaStar banned the superhuman bursts of APM, which we saw in the demonstration. If I designed the system, I would set the following restrictions:

Maximum average APM throughout the game .
Maximum short burst APM . I think it is reasonable to set it at 4-6 clicks per second. Remember Serral and his EPM 344, which is a cut above competitors? This is less than six clicks per second. Against MaNa, the program issued 25 clicks per second for long periods of time. This is much faster than even the fastest spam clicks of a person, so the initial restrictions are unlikely to allow this.
Minimum time between clicks . Even if you limit the maximum speed during bursts, a bot can very quickly click in a brief moment during the allowed interval, which a person is not capable of.

Some suggest adding an element of chance to accuracy of clicks, but I suspect that this will reduce the learning rate too much.

So, set limits. What's next? Then Deepmind launched simulation training on thousands of high-end amateur video games. At this stage, the agent simply tries to imitate what people do - and he masters spam clicks. This is very likely because people make them very often. This is almost the most repetitive pattern of behavior in humans, so it must be deeply rooted in the behavior of the agent.

The maximum bursts of APM at AlphaStar are initially close to the established limits. But most AlphaStar clicks turned out to be spam clicks, so its APM was not enough for a normal fight. But without experimentation, there is no learning. Here is what one of the developers in yesterday's AMA said: I think it is a little smeared in this scam:

Oriol Vignals, lead architect AlphaStar: “Teaching AI to play with low APM is quite interesting. In the early days, our agents were trained with very low APM and were not at all capable of micromanagement. ”

To speed up learning, developers raise APM limits by allowing short-term surges. Here are the limitations of APM that acted for AlphaStar in the demo match:

Oriol Vignals: “In particular, we set a limit of 600 APM in intervals of 5 seconds, 400 APM in intervals of 15 seconds, 320 for 30 seconds and 300 for 60 seconds. If the agent issues more actions at these intervals, we discard / ignore them. These values are taken from human statistics. ”

If you are not very familiar with Starcraft, then such limits look reasonable, but they allow for superhuman bursts of APM, which we talked about earlier, as well as superhuman accuracy.

There is a limit on the maximum number of spam clicks. These are usually commands for moving or attacking when clicking on a map. Try how fast you can click the mouse. The agent has learned spam clicks from players and will not click faster than a person. That is, additional APM clicks at superhuman speed are “arbitrary” for experiments.

Arbitrary APM is used for battle experiments. This interaction often occurs during exercise. AlphaStar begins to learn a new type of behavior that leads to better results, and the proportion of spam in clicks is reduced.

If the agent learned the efficiencies, why did Deepmind not return to the initial tougher, more humane restrictions on APM? Surely they realized that the AI is demonstrating superhuman abilities. The Starcraft community has almost unanimously recognized AlphaStar's inhuman micromanagement. Pros told the AMA that AlphaStar's main strength is control over units, and the main weakness is strategic thinking. Deepmind developers must have come to the same conclusion. Probably the reason is that the agent could not get rid of spam clicks. Although most of the time he acts clearly, but still regularly falls into spam clicks. This is evident in the first game against MaNa, when Alphastar climbs up the ramp (from 39:30):

Look carefully at the blue circles with the selection of units

Agent spam-clicked command to move units at a speed of 800 APM. He never learned completely from human nonsense, although these actions are completely useless and eat up his APM limit. The bug is especially dangerous during big battles. Apparently, the APM limit was raised to fix the joint and allow the agent to work normally in such moments.

What is so important about it?

I suspect that the agent could not get rid of the spam clicks that he learned during the simulation training in humans. Deepmind had to tinker with the APM limit to make experiments and further progress possible. However, the unpleasant side effect of the superhuman game was revealed, because of which, in essence, the agent breaks the rules, being able to implement the strategies that were initially forbidden to him.

This is an important thing, because this beating of professionals directly contradicts the mission that Deepmind has repeatedly stated. Because of this, this chart leaves a sour taste of hypocrisy in the mouth:

This image Deepmind posted on her blog.

It looks like the schedule is designed to mislead people unfamiliar with Starcraft 2. It depicts AlphaStar's supposedly acceptable APM. Look at APM MaNa and compare it with AlphaStar. Although the average is higher for MaNa, the AlphaStar tail goes far beyond human capabilities. Note that MaNa has a peak APM of around 750, while AlphaStar is above 1500. Now note that a person’s APM is more than half spam-clicked, and EPM AlphaStar is the perfect exact click.

Now take a look at APM at TLO. The tail goes away for 2000. Think about it for a second. How is this possible? This was made possible thanks to a stunt called “fast fire”. TLO does not click super fast. He simply holds the button - and the game registers it as 2000 APM. The only thing you can do with fast fire is spamming at a crazy speed. That's all. TLO just uses it for some reason. But at the same time superhuman bursts of APM at AlphaStar are being masked - and the figures look realistic for people who are not familiar with Starcraft.

Deepmind's blog post does not attempt to explain absurd TLO numbers. If they do not explain the exaggerated TLOs, they should not be included in the schedule. Point.

Such statistics dangerously close to lies. Deepmind should stick to higher standards.

Source: https://habr.com/ru/post/437796/