March 2, 2018
Eric Rose, PhD Candidate
In a previous Laber Labs blog post by Marshall Wang, he discussed using Reinforcement Learning to teach an AI to learn to play the game Laser Cats, which has since been rebranded as Space Mice. In this scenario, the AI knows nothing about the mechanics of the game but is able to learn to play the game far more effectively than is possible by any human player. To achieve this level of skill, the computer player records the current state of the game, the action it took, and a resulting reward or penalty based on the resulting next state of the game. The computer then learns the optimal strategy for maximizing their reward.
However, in some complex situations this approach can be difficult for a computer. For example, it might be difficult to design a reward function or maybe the decision on what action is best is very complicated. In situations when it is possible for a human to play close to optimally, it can be far easier for a computer to learn how to replicate how a human player plays the game. This is called imitation learning!
Let’s use the game Flying Squirrel (playable here) to demonstrate how imitation learning can be used to teach a computer player how to play a game. In this game, the player controls the squirrel and is playing against a clock. The goal is to traverse the hills as fast as possible so you can complete the level before you run out of time. At first, you have only one possible choice: to dive or not to dive. Diving adds a downward force to the squirrel. (Insider tip: to move as fast as possible, you want to dive so that you land on the downward sloping part of the hill, continue diving as you move downhill, and then release the dive button right before you reach the uphill so that your momentum lets you jump off the hill and fly through the air. It’s pretty cool!) As you move through the game you gain special abilities and obstacles appear that you need to dodge, but we will limit our discussion to the simplest case.
In imitation learning, the computer is watching! It is recording features in each frame that summarize the current state of the game (such as how high you are above the ground, your velocity, direction, and features summarizing the shape of the hill in front of you) as well as the action you, the human player, took in this state. This record keeping changes the problem of how to play the game to one of classification. The computer uses the state of the game as input, explores its database of user experiences, and outputs the choice to dive or not to dive. Many classification algorithms can be used for this problem. For Flying Squirrel, we used k-nearest neighbors to teach the computer to play. This works by taking the current state of the game and finding the k past states that are most similar to the current state. We can then look at what action was chosen by the human in each of these past states and choose the action that was most common among those k states.
To see this in action, you can play the game and switch to watching a computer player play using imitation learning based on the data that was just collected on you!
Eric is a PhD Candidate whose research interests include machine learning and statistical computing. His current research focuses on sample size calculations for dynamic treatment regimes. We asked a fellow Laber Labs colleague to ask Eric a probing question.
Imagine that you’re a game of thrones character–what is your house name, its sigil, and the house motto?
I guess my house name would be House Rose since I’m pretty sure they’re all the same as the family name. I also didn’t read the books and have only seen a couple of episodes so that may not even be true. Our sigil would contain just a vinyl record. If my imaginary Game of Thrones character is anything like the real me, music is probably going to be a huge part of his life. Then again if he was anything like the actual me he probably wouldn’t survive for very long. Our house motto would be along the same lines and would be a line from one of my favorite songs by the Drive-By Truckers that I also think could apply to Game of Thrones. “It’s great to be alive!”
This is Eric’s second post! To learn more about his research, check out his first article here!