A good data set is a gold mine for a statistician. A good data set provides an opportunity to put one’s favorite statistical methods to the test or to try out new ones. In fact, a good data set is worth even more: it presents novel challenges and questions, which can in turn inspire creative approaches to problem solving that meld statistical, mathematical, and other ideas that may have never been realized otherwise.
At Doran’s Lab, the undergraduate data science research group at Laber Labs, we have found our gold mine, and it sits atop the worldwide phenomenon League of Legends. Released by Riot Games in 2009, League of Legends (or, “League”) is a 5-vs-5 competitive online game with a massive following and a professional scene of unprecedented maturity. In keeping with Riot’s openness toward their player base, they maintain a public API that lets anyone with a free account make requests to obtain detailed game data. In our case, we were given permission to make requests at a high volume, which has allowed us to fill up almost nine terabytes of hard drive space in about a year; we collect over 100,000 games every day and have logged data from well over 3 million accounts in North America. This data is ripe for insights.
After a year of collaborative work we’ve still only scratched the surface of the possibilities presented by the data, which contains information as detailed as (1) minute-to-minute updates on player location, gold, and experience (which are the primary resources in the game) and (2) exact times and details of important game events like item purchases, champion, building, and elite monster kills. If you’re interested in reading about these projects, check out our awesome website at doranslab.gg.
Since our undergraduate data analysts have already done an excellent job writing up their work on past projects, I’d like to highlight three examples of unanticipated learning that resulted from a challenge in our data. These are a few tiny examples of times when the data demanded a new approach, although I stress that there were many more challenges and the bulk of the good stuff is contained in the linked articles themselves!
These are just a few examples, and there are more every day for everyone at Doran’s Lab. If you’d like to jump in to experience the magic yourself, check out our publicly available, curated datasets here and follow us on Twitter (@DoransLab) to hear more soon.
Alex is a PhD candidate working with Laber Labs. His research interests include reinforcement learning and interpretable models. We thought this posting was a great excuse to get to know a little more about him, so we asked him a question!
What is your favorite statistics-themed limerick?
This is a rap, not a limerick–because we live in 2019. It’s about the baddest motherf—er to ever enroll in the graduate statistics program at NC State.