A Smart
X-Pilot Agent
Maneuver â—‹ Defend â—‹ Attack
Strategy
Our strategy is simple - listen to each advisor, and choose the best advice
In order to develop this concept, we wrote a production system for the attacking, defensive, and maneuvering board members, implemented a queue genetic algorithm to train each production system, and designed a neural network to allow the agent to decide which advice is best for its unique situation.
In each production system, we wrote the general framework for what we thought each board member should be recommending to the agent, but inserted variables throughout the production system to allow our genetic algorithm to sharpen each production system, finding the perfect constants for each variable that would be unreasonable for us to find manually. The queue genetic algorithm, a special subset of a genetic algorithm, would then do exactly that, training each production system to its highest level of quality.
The genetic algorithm trained each production system in a unique environment suited to what it was to learn. For the attacking production system, we let it train against a bot in a edge wrapped map (if you fly past the top of the map, you end up at the bottom of the map) as to make the training more efficient, since it isn’t concerned with hitting walls and shouldn’t be trained to account for them. For the defender, it was trained in a map which enforced it to stay in a central region by making the center a potential well; eight shooters were placed on the perimeter of the well to shoot at the agent inside, allowing the robot to only have to account for incoming bullets. Lastly, the maneuvering chromosome was trained in a standard four-walled map, avoiding the walls.
We then used a standard neural network to train our agent in judging its situation and choosing the right strategy. Manually, we gave the neural network an annotation set that we believed would be appropriate, and trained the network to match the training set.
Aim
The aim of this project was to build a strong fighter pilot agent in XPilot, a virtual environment well-suited to studying artificial intelligence. We planned to construct what we call an “advisory board,” or a set of three algorithms, each with a specific heuristic it pushes to the agent. The advisory board was to be made of an attacking, defensive, and maneuvering algorithm. The attacking algorithm was to recommend the best next action for the agent in order to commence an attack, the defender to recommend a bullet dodge, and the maneuver to avoid crashing into walls. Depending on the agent’s environment, the agent was to assess which strategy it needed most at each frame, choose the ideal recommendation from the executive board, and execute it.
Genetic Algorithms
A queue genetic algorithm operates like a regular genetic algorithm, using stochastic processes to select parents to create offspring through crossover and applying mutations at a low percentage rate. The key difference is in the way that the new population of offspring replaces the parent population. Rather than accumulating the same number offspring as there are parents and wiping out all parents at once to put in the new population of chromosomes, a queue genetic algorithm divides the process into discrete steps of arbitrary number. Say there is a population of P chromosomes at any given time; we then use that population to create a population of O offspring. This offspring population then replaces the O oldest individuals in the parent population. Thus, after P/O generations, the entire original parent population will be replaced. This allows new offspring chromosomes with high fitness levels to begin creating offspring without having to wait for an entire population of P offspring to be made.
After implementing our queue genetic algorithm, we used it to sharpen all three production systems. We used multi-point crossover to separate each variable in the chromosome to ensure that if a chromosome found a strong variable value, it wouldn’t break the bitstring of that variable in half and combine it with half the bitstring of another parent, thus losing the strong variable value. In addition, we were able to set our queue genetic algorithm to run multiple bots in parallel, which allowed us to train each batch of offspring simultaneously, put them in the place of the oldest parent chromosomes, and repeat; this made the training process many times more efficient and allowed us more time to focus on other aspects of the project.
Neural Network
A neural network seemed to be the right tool for the agent to decide which board member to listen to because the neural network is an extremely flexible multivariate function replicator, taking a set of inputs and generating a set of outputs however you would like it to, depending on how many hidden layers it has. We gave the network 5 inputs, including the distances to the wall in front of, behind, and in the direction of motion of the agent, the distance of the closest enemy, and the distance of the closest bullet. We used backpropagation techniques to train the neural network, using the following annotation: if the agent’s distance to a wall is below a threshold value, listen to the maneuvering board member; else, if the agent is near the enemy, use the attacker’s advice; else, if the agent is near a bullet, then defend; else, listen to the attacker.
Evolution
Maneuver
This moving average shows improvement of the chromosome to move around the map without colliding. The bot was awarded a point for every frame alive plus half its speed in every frame to motivate it to move. 280 points were deducted for every collision.
Attack
This moving average shows improvement of the chromosome to attack enemy ship in the map. The bot was awarded 200 points for every kill. No deductions. There was an enemy bot in the map that would move around and also shoot.
Defense
This moving average shows improvement of the chromosome to defend and dodge the enemy bullet. 10 points for ever frame alive and deduct 280 for a death. 8 enemy bots in the frame always try to shoot the chromosome.
MANEUVER
The adjacent video shows how the chromosome evolved to maneuver around the simple map
ATTACK
The adjacent video shows how the chromosome evolved to shoot down and attack an enemy agent in the simple X-pilot map
DEFEND
The adjacent video shows evolution of a chromosome to defend and dodge bullets being shot towards it by 8 enemy agents.
GAME PLAY
The adjacent video shows the neural net in action. It is switching between maneuvering, attacking and defending based on the environment variables.
Training
After all 3 chromosomes were evolved, it was time to stitch them together in one code. This is where the neural net steps in. We generated 1000 dummy input instances with ground truths. These inputs read 5 variables from the environment like mentioned above. The net's job was to decide which chromosome to activate in any given frame. To put it simply - always stay in the attacking mode until - if near walls than maneuver, if near enemy agent/bullet then defend.
​
The architecture of the net is simple - 1 input layer with 5 perceptrons, 1 hidden layer with 4 perceptrons and 1 output. The output tells which chromosome to activate. The net was learning using back propagation on all 1000 inputs over 100 epochs with a learning rate of 0.1. The net was able to achieve an accuracy of 100%.
OUTCOME
Overall, our project was a success. We saw a strong improvement in the attacking chromosome over time and a decent improvement in the maneuvering chromosome. There was some improvement in the defending chromosome over generations; evading a moving projectile is a very difficult task, and we aim to work more on that production system. However, the neural network operated seamlessly, and we were able to produce a fully operational advisory-board-driven agent. The performance of the agent was very strong relative to previous intelligence designs our group had worked on in the past, and we intend to continue its development further in the future.