Russian AI Cup 2018, history 9 places

so

Like last year, my name is Andrei Rybalka, only this time I am 33. And, since I was in the top ten, I decided to share my approach to writing a game bot for the Russian AI Cup 2018 again.

This time the task was football. The task itself was somewhat reminiscent of the 2014 RAIC, when it was hockey, but the solution was completely different.

The world this time was three-dimensional and this three-dimensionality was used in full. The game itself was most reminiscent of the Rocket League .

I will not bore the introductory part, it is easier to show how it looked. You can watch the games on the website or on video:

So that life does not seem too sweet to us, the developers, in addition to the non-determinism of the game world, also crushed the game tick into 100 sabtics, which initially put an end to exact simulation for most participants, and even more so for me, intending to write a bot on slow java.

Also, I must say that the championship is divided into rounds, which probably would be more correct to call tours.

Briefly about the tournament system

For a start, 2 weeks to develop. Then passes the first round. 300 of the best of it pass on.

After the round, the rules of the game change (and specifically, nitro is added to the game) and another 2 weeks are given, after which the second round passes.

Then the rules are again complicated (the third player is added), another week is given and we play the final.

But this is not the end. After the finals, there is still a week, at the end of which the sandbox just stops, and the 6 best ones in it, excluding the prize winners, are also awarded. The fundamental difference between the sandbox finals and the championship finals is that in the sandbox games are created in a random format, and not just in the format of the current round.

Participation stories

The technical part will be lower. Who is not interested in the story, you can scroll down until it becomes good.

First round

Began, like most, from the week of the beta. I spent a lot of time, 4+ hours each evening.
Before filling in the first version, I went through several iterations.
code, until it starts to beat the previous version - we collect - we consider the current version of the previous version - we repeat .

I didn’t hurry with the first one and it happened a few days before the round. And, since, until now, my bot has not played with anyone, I had no idea what kind of world I am and what positions I can claim in the rating. When I saw that I had won more than 100 games in a row without a single loss, I calmed down.

In general, the first loss I had, it seems, in 12th place, by time-limit, and the first game lost in a row is already in the top 10.

In short, I realized that I have chances to get into the second round, where the top 300 passes.
Therefore, I did not chase the position in it and did not flood anything for the round any more, but simply continued to work.

At that time I saw that there was still a lot of room for improvement and without connecting nitro (which appeared after the 1st round), so I focused on the main part of the strategy, realizing that before the second round I had more than 2 weeks more and I would have time to fix the nitro.

Second round

The first week I was actively programming, but still not connected nitro. I wanted to do this in the second week. But everything turned out differently, because by the end of the first week I had slept with pneumonia. I was not in a position to program, so I just filled in what was, and, one can say, active participation in the championship for me at this place was over.

Over the next 3 weeks before the end of the championship, he worked on the strategy for a total of 20 hours.

As a result, in the second round, my bot basically did not know that there was nitro in the game, but somehow it still took 16th place.

The final

The third player was added to the final.

I wrote on slow Java, and not in C ++, as 7 out of 8 people above me in the rating, and my bot often fell before the timeout, so with the advent of the third player, it began to fall in 100% of the games. Fortunately, sandbox games are created in a random format, so I automatically
he lost only every third game and therefore flew down not too much. It seems to have dropped to 18th place.

If we do not consider programming the correction of coefficients in the evaluation function and the launch of tests, then for the first time, after the onset of the disease, I sat behind the bot the evening before the final. He added a very simple nitro, directed strictly upwards, made it so that the two attackers stopped running at the same point and colliding there with each other, and cut everything they could for the 3x3 game, starting from the depth of the calculation and ending with the simulation accuracy, so that Only the bot did not die on timeout.
In this form, it played the final.

In the break between the halves of the final, I sat down behind the bot again and spent a good 10 hours. For the most part, the edits concerned the dynamic selection of coefficients, the early interruption of genetics, etc. In general, I was looking for a balance between accuracy and depth of miscalculation and speed.
In addition to fighting performance, I made a couple of changes:

Sent the far (relative to the ball) attacker to a point in the middle between the ball and the opponent's goal
A little corrected nitro (the description will be in those. Part). It was still extremely simple, but it became much more efficient to work.

Total, having banished tests and having seen account 395: 254 against the previous version, on it calmed down. This allowed me to take the 9th place in the final.

Sandbox final

I continued to hurt and did not work on the bot for most of the week. A day before the end, I saw that several people had filled up the latest versions, which often win from me and can be thrown out of the sandbox prizes. So I spent another couple of hours.

The only major change is that I dug up my branch in Git three weeks ago, in which I had a simulation of the movement of the enemy with my simplified algorithm. At that time, it worked poorly, but I brought it to mind, drove the tests, saw
that wins from the previous version almost doubled and flooded. So, at the time of the stop, I was in 10th place in the general table, which corresponds to the 4th place in the final sandbox.

How it all works (technical part)

I apologize in advance if there are inaccuracies in terminology. Also, I am writing from memory, so it is possible that somewhere I will describe not the final version.

So, based on genetic algorithms. A chromosome consists of several genes:

Fractional number in the range -PI..PI, specifying the direction of movement
An integer in the range 0..10, specifying the number of repetitions of this action
Fractional number from 0 to 1. If the value is above a certain threshold, make a jump

A genotype may include a different number of chromosomes, but so that the total number of actions (including repetitions) is 40.

Initially, I create several dozen random genotypes. I add to them:

Trajectory right on the ball
Straight trajectories in all directions, only 10 pieces with an offset of 36 degrees
A genotype that simply does nothing (without it, the bot always runs somewhere, even if it is already at the optimal point)
The best genotype from the previous tick

Then it is all simulated and run through the evaluation function. N best genotypes "survive" and are cloned M times with mutations. During mutation, each gene changes in a given range with a probability of 10%. Well, this is repeated for several generations.
There is no crossing, in this task I do not see any sense in it.

Total, the maximum possible number of trajectories per tick per player was about 800, but in fact, in most cases it was much less, because in some cases (for example, when in the near future we will definitely not be able to touch the ball), the movement of players has been replaced by simple heuristics. In addition, N, M and the number of generations depended on the situation on the field. First of all, from the distance to the ball. Also, the miscalculation is interrupted prematurely (but not earlier than the 5th generation), if a trajectory is found with an acceptable estimate.

Macro

The goalkeeper runs to the point in front of the center of the goal. My tests showed that he plays the best in me standing in front of the goal, and not inside them, like most players in the top.

The position of the point deviated from the center depending on several factors: the position and direction of the ball's flight, the point the ball hit my goal, if a goal is planned, the location of the nearest attacking opponent, etc.

If the ball is on the side of the enemy and flies towards its goal, we can go for nitro.

If my goalkeeper can hit the ball earlier than my attacker (plus a few more conditions), then the attacker ignores the ball and runs to a point in the middle between the ball and the opponent's goal. I went through a lot of options for exactly where to run. In my case, this one worked best.

Otherwise, if the ball is too far, the attacker runs in a straight line to the nearest point of contact of the ball with the floor at which he can intercept the ball (if we don’t have time to reach the first point of contact, check the next one, etc.)

Otherwise (when the ball is reached), the attacker goes where the evaluation function will tell him. Yes, and also, if the nitro is not far away and we can pick it up, we select it.

In a 3x3 game, the second attacker is more likely to strive for the ball and with a lesser run forward, waiting for a pass from the goalkeeper. But if you still run, then the point is chosen another - closer to the center line.

I also each tick one-time simulated the ball 100 ticks forward with 100 mikrotikami (with caching).

This trajectory has been used in many places. For example:

To determine the points of contact of the ball with the floor
To find out if the ball threatens my goal and whether the goalkeeper should be switched to simulation mode

The same exact trajectory was used in the simulation of the trajectories of the players, so as not to recalculate the movement of the ball each time. But only until the first collision of the ball with any football player.

By the way, writing Footballist was too lazy, the words Player, Robot were reserved by the strategy,
so my wrapper class was called Dude :)

Simulation

In most cases, it took place with one microtic, but in some situations switched to accurate mode with a large number of microtics (at the beginning by 100, then reduced to 50 in the game 2x2, because the tests showed that the difference in the results is within the margin of error, and to 10 in 3x3, for otherwise he flew away in timeouts).

In accurate mode, I switched either at the time of jumping, or being so close to the ball that a collision is possible at the next tick. Moreover, here, too, was the mass of small crutches, hacks, optimizations, in which I myself can not figure it out.

For example, the flying ball was still simulated with 1 microtic, but if after the next microtic I saw that a collision had occurred, he would roll back to the previous position and simulate it again with greater accuracy.

In addition, I also simulated other players (both ours and others), if they were in the air (and, consequently, their trajectory is easier to predict), or were close to the ball. For opponents in the final version, a simplified version of my own decision-making strategy was used, which was run every 5 ticks (more often it did not allow performance).

When simulating each character, I calculated myself, the ball and other players for 40 ticks forward (my limit on the number of actions in the genotype) and then for the same number of ticks, only the ball simulated.

Nitro

Simple to indecency.

In the final version, nitro is always included, if it is, if the player is in the air, and if he did not hit the ball in the last few ticks.

At the beginning, I always directed the nitro up strictly, but then I tried to experiment and the best thing was to work the option to go straight to the center of the ball. I also tried options so that the direction of nitro was chosen by genetics.

Worked much worse. Perhaps due to lack of depth busting.

Evaluation function

The sum of scores on each tick with a 2% attenuation per tick.

The biggest weight, of course, had a goal. Several things influenced its weight:

The distance from the ball to the enemy goalkeeper at the time of the goal (the farther - the better)
Y coordinate of the ball (because at the top of the gate it is much harder to beat off)
The speed of the ball on the Z axis (which is directed to the enemy gate)

When attacking me, everything is the same, only with the opposite sign.

Further, for the attacker, the overall score depended on:

Distance from the player to the ball (so that he runs to the ball, even if he cannot hit him)
Penalty for a jump (to jump only if it brings so many points that they exceed this penalty)
Distances on the next simulation tick from the ball to the opponents
The coordinates of the Y ball (the higher it is, the less chance the enemy will intercept it)
Cosine of the angle between the direction of the ball and the center of the enemy gates
Flag, did I touch the ball
Flag, whether the enemy touched the ball
Bonus for the selection of nitro

Also, there was a small bonus for hitting an enemy player. Although in fact, although it happened, but rarely.

For goalkeeper:

Bonus for distances to the ball, ball speed in Z, ball position in Y
Penalty for jump
The penalty for finding the ball in the area in front of my goal
The distance to the enemies and to my attackers was taken into account (so that the ball flew away from the enemies, but, if possible, flew closer to my attackers)
And a few little things.

Machine learning

There was very little in one of the branches of the gita as an experiment. But it seems to me worth mentioning anyway. I did not have time to bring it to mind (and I’m not sure it made sense).

In general, I tried with his help to predict whether the enemy can intercept the ball, based on the positions and speeds of the enemy and the ball. Planned to use this in the evaluation function. Fine trajectories that can be intercepted.

But I immediately understood that I could not afford not only a neural network, but nothing at all serious, because it would have to be performed 80 times per trajectory. Well, even 40 or 20, if not every tick, but still, I didn’t have any time reserve at all, so I didn’t even consider these options.

Here is what I did:

I drove a few games with a modified bot, in which, when searching for the trajectory, I saved data about myself and the ball, as well as the flag of whether the trajectory at which I intercepted the ball was found.

I considered all coordinates relative to a football player. Those. I always had it in the coordinate [0,0,0], so I saved only 10 fields: the relative position of the ball, the velocity vector of the ball, the velocity vector of the soccer player, the binary flag of interception. I saved dataset only for the central part of the field, since I understood that simple algorithms would not pull the registration of boards either.

Then I fed this dataset to a DecisionTreeClassifier with max_depth = 7. The trained tree gave an accuracy, as I recall, of the order of 90%.

Next, I exported the tree to a set of if-s (which DecisionTree is essentially).

It looked something like this:

public static boolean predict(double dude_vel_x, double dude_vel_y, double dude_vel_z, double ball_rel_pos_x, double ball_rel_pos_y, double ball_rel_pos_z, double ball_vel_x, double ball_vel_y, double ball_vel_z) { if (ball_vel_z <= 6.4765448570251465) { if (dude_vel_y <= -6.087389945983887) { if (ball_vel_z <= -20.188323974609375) { if (dude_vel_x <= 13.032730102539062) { if (ball_rel_pos_y <= -1.1829500198364258) { return ball_vel_y <= 18.906089782714844; } else { return false; } } else { return true; } } else { // ............................

At this stage, I drove the tests, did not see the improvement, and postponed the proceedings until later, which, because of my adventures, did not come.

Shredinbag

Somewhere after the first round I caught this rare animal.

Who does not know, this is a bug that is found only by reading the code, and having found it, the developer wonders how the program could work at all. And in my case, she also kept in the top 10.

In general, the bug was that a constructor was called in the gene copy function, in which an optional argument was passed that contains the value of this gene. In the absence of this argument, the value was chosen randomly. Thus, when trying to copy a gene, instead of a copy, it created a new random sample.

In fact, instead of genetics, I had a random search, since each tick simply generated several hundred random trajectories and selected the best one.

After the correction (consisting in adding 2 characters to the code), it became about 3 times better to play.

Ritual dances

At some point in time, I noticed that the players sometimes bounce without a reason, being far from the ball, despite the penalty.

The explanation turned out to be such that I counted the moment of the jump with an accuracy of 100 microtics. And sometimes it turned out that just at the time of the jump there was a collision of a ball with a barbell. And depending on the accuracy of the calculation in this particular tick, the intended trajectory either led to a goal or not.

Roughly speaking, the ball flies to the gates of the enemy and hits the post. my footballer, running at the other end of the field, simulates a trajectory without a jump (with 1 micro tip) and sees that the ball misses the goal.

Then another trajectory comes across, with a jump exactly at the moment of hitting the ball against the bar. And since a tick with a jump, I consider with an accuracy of 100 microtics, not only for a football player, but also for a ball, the calculated angle of the ball’s rebound differs from the angle obtained in the 1-microtic trajectory, and it may happen that the ball in this more accurate trajectory falls into goal.

Consequently, this particular trajectory will be chosen and the bot will jump.

In general, by performing a ritual dance with jumping, the players scored a goal :)

Killer feature

She's not

Testing

I drove endless games in 8 streams (4 on a computer and on a laptop). The number of games was chosen so that it was statistically significant.

With a significant improvement strategy could be satisfied with half a thousand heads in the amount
with smaller edits, left overnight and then the bill went into the thousands.

Genetic selection of constants

I tried before the first round. Nothing really gave for the reason that for genetics you need to play a tournament from a large number of games.

I tried to play games of 100,000 ticks, but this was not enough. With a small difference in strength (and usually this is the case when selecting constants), for 100k ticks the winner depends too much on the case. You need to play much more to be sure of the winner. And I could not afford to leave the selection for a day or more, so I refused this idea.

Finally

Traditional thanks to the organizers. The task was interesting. It is a pity that I just had to miss almost half of the championship and really did not do anything for nitro or for three players.

As a result, I watched to the very end in the sandbox how my strategy wins in 2x2 mode without nitro with a score of 13: 2 for some Mr.Smile, who finished 3rd in the final, and after a few games loses to him 12: 2 in 3x3 mode with nitro :)

And of course, the video from my samopny visualizer:

Only probably have to say goodbye to this visualizer in future championships.
For each time more and more convinced that if you claim to normal places, the only option is to write on ...

... well, you understand.

Tired every time to rest on the sluggishness of Java and cut the power of the strategy to meet the allotted time.

I hope someone has found something interesting or useful in my opus with an autobiographical note :)