November 19, 2019
Yeng Saanchi, PhD Candidate
It was love at the nth glance. I can only pinpoint within some epsilon accuracy when this happened. I must say that it took me approximately O(1/epsilon) iterations of personal deliberation to converge to this realization. And even then, with my usual intransigence, I was loath to admit that I had finally fallen irrevocably (I hope) in love. It was only after a colleague brought to my attention how my eyes lit up when this love interest came up in conversation, did I finally let go of the last of my resistance.
Confused? Sorry. Let me start from the beginning.
Last year, during my time at the Science and Engineering Festival in DC, I encountered a middle schooler who couldn’t stop waxing lyrical about the virtues of pi. I didn’t quite understand her fascination with the number pi, but I envied her because she was obviously in love. I was in my third year and I hadn’t yet found that area of Statistics that inspired me to want to immerse myself in it. I was still dabbling in variable selection methods, and though this was an interest of mine, it simply didn’t do it for me. It lacked the excitement and appeal that I craved. The turning point came when I took a class on Artificial Intelligence in the Physics Department of all places. After doing a simulation based on one of my homework assignments, I discovered something that puzzled me. I had noticed that after training my machine learning model using some SGD variant, my test error was appreciably lower than the training error. I went to my advisor to inquire about his thoughts on the matter and that was the beginning of my maddening but altogether glorious relationship with Stochastic Gradient Descent, SGD for short.
Now, a little background. SGD belongs to the family of recursive algorithms known as Stochastic Approximation (SA) algorithms. The first SA algorithm was a result of a partnership between Herbert Robbins and Sutton Monro. Their dream was to find the root of an unknown function expressed as an expected value, given noisy estimates of the function values at different points in the domain. This dream was realized when they deduced that one could start at an arbitrary value in the domain of the function and obtain the next iterate by taking a step in the direction of the negative of the function value. They showed that provided the function is nondecreasing and has a positive gradient at the unique root, and that the step-size satisfies certain requirements, the algorithm will converge to the optimal value.
SGD is used in the field of stochastic optimization, specifically for minimizing convex cost functions for which the analytical form of the gradient is difficult to obtain or computationally expensive to compute. SGD is especially useful when data is not static but rather, streaming in. In this case, noisy estimates of the gradient are obtained using a single observation or a subset of the data. It is also one of the most important methods used in training machine learning models.
Despite its apparent utility, SGD suffers from a sensitivity to the choice of step-size that affects the rate of convergence to a large extent. Since its inception, there have been myriad adaptations of the method to inform the selection of the learning rate to speed up convergence and to modify it to suit specific problems. From Adam and EVE to Pegasos and SAGA, SGD and its many variants are highly suited to solving a wide range of optimization problems.
Although widely used in many different fields, from acoustics to neuroscience, the problem of quantifying the uncertainty around SGD estimates has seen very little work. The goal of my work is, therefore, to explore similarities between the implicit regularization of SGD and classical regularization methods that may give us a deeper understanding of how the algorithm works and enable us to better estimate the variability surrounding SGD iterates.
Even though SGD still frustrates me no end, I still marvel at its versatility and the beauty of its simplicity. Ours is a love-hate relationship, but I’ve heard that those are the best. I hope so!
Yeng is a PhD Candidate whose research interests include predictive modeling and variable selection. Her current research focuses on exploring similarities between stochastic gradient methods and classical regularization methods. We asked a fellow Laber Labs colleague to ask Yeng a probing question.
Humanity flees an uninhabitable Earth in some distant future. Due to AI-driven mutagenic warfare, certain foods are now semi-sentient creatures that human beings stalk and hunt for survival. You are the Final Arbiter in deciding which of these semi-sentient foods humanity will take on its last indefinite voyage. An affable, anthropomorphized container of french fries stands before you. Please evaluate the french fries, its pros, and its cons. Please provide your final verdict on whether humanity leaves all of french fries to wither in the husk of the Earth, or if humanity will provide precious space for french fry livestock onboard its last ship to the stars.
I must say that I think I’m highly qualified to be the Final Arbiter of Food, though I have very particular tastes that few people share. There are several problems with semi-sentient foods. For instance, what happens when a child decides that she can’t eat Bruce or Barney or any one of the anthropomorphic potatoes that she’s made friends with?
Having said that, I’ll move on to the merits of potatoes, non-sentient ones, to be precise. Potatoes are very versatile. To borrow the words of Samwise the Brave to the creature Gollum, you can boil them, mash them, stick them in a stew, among others. They are also a rich source of nutrients such as protein, fiber, Vitamin C, potassium, and carbohydrates. Guess what? They’re gluten-free too!
Despite the many advantages of potatoes, they can be detrimental to people with diabetes or obesity as they contain high levels of simple carbohydrates. Potatoes must be stored at temperatures between 7 and 10 Degrees Celsius, else they may turn green, an indicator of the presence of a toxic compound called solanine.
Now concerning the container of greasy, (makes a sound of distaste) anthropomorphic french fries, as Arbiter of Food, I will move to leave them behind. Most people, being partial to french fries, may hate me for this decision. But alas, that is the lot of visionary leaders. I say let’s gather all the non-sentient foods first, including Jerusalem artichokes (laughs uproariously), then we can consider the semi-sentient ones. What say you?