November 6, 2018
Zekun (Jack) Xu, PhD Candidate
We almost never observe the absolute truth. In fact, there are entire industries driven by the single motivation of distorting it! How many times a day do you hear “You can look years younger!”? However, the distortion is not always intentional – has your GPS ever shown you driving through a nearby field? Or maybe your FitBit didn’t realize your leisurely walk was not a nap? Such distortions happen all of the time, and it can be hard to know what is true.
In our day to day lives, we, as the observers, must recognize that what we see or hear is not all that is there. We must continuously dig deeper to understand what is real. Frankly, it can be exhausting!
Fortunately, in science, there are tools that help us estimate the truth based on what we observe! In statistics, a class of models has been developed for just this purpose—the latent (or hidden) state models. Most of the models in this class are based on either the so-called dynamic linear model or the hidden Markov model. Both models date back to the 1960s [1][2], but they are still popular. And, in my opinion, they are the coolest generative models!
In the dynamic linear model, we assume that the data we observe over time is a noisy realization of a latent true process. For instance, to monitor air quality the EPA records the hourly concentration values for a variety airborne particles (O3, NO2, etc.). However, due to the measurement error from device and operation, the recording data is a noisy version of the true value. A similar example is the navigation system, where the GPS data from the satellite would deviate from the actual coordinates to some extent–thus your appearing to drive through a corn field! In both cases, a dynamic linear model can be used to filter out the noise and perform predictions.
In the hidden Markov model, we assume that there is more than one underlying data-generating mechanisms, or the so-called states. For instance, physical activity data gathered through wearable devices like FitBit and iWatch. Those data do not contain the activity labels but provide only the intensity during the wearing time, which is driven by the actual activity state. In fact, one of my current research projects is to identify interesting patterns in human activity data, which are measured by wearable devices worn continuously. Based on those data, we want to be able to determine whether a subject is doing high intensity activity (e.g., running), medium intensity activity (e.g., walking), or low intensity activity (e.g., resting) during different times of the day. We can use this information to compare the lifestyle between different subjects. This is an interesting topic, especially in this era of artificial intelligence. For example, we might be able to build “smart” wearable devices based on this model that provide personalized suggestions regarding healthy lifestyle choices. This framework is both useful for predicting and modeling the effect of activity on health outcomes.
To paraphrase from Plato’s idealism, truth is an abstraction of the external world that we live in. All that we observe is a projection of the ideal world into reality. It is great to have some tools that aim to uncover the truth from the observation, but care must be taken regarding when those tools are applicable.
References
[1] Kalman, Rudolph Emil. “A new approach to linear filtering and prediction problems.” Journal of basic Engineering 82.1 (1960): 35-45.
[2] Baum, Leonard E., and Ted Petrie. “Statistical inference for probabilistic functions of finite state Markov chains.” The annals of mathematical statistics 37.6 (1966): 1554-1563.
Jack is a PhD candidate working with Laber Labs. His research interests include wearable computing and hidden Markov model. Currently he is working on hidden Markov models with applications in veterinary data. We thought this posting was a great excuse to get to know a little more about him, so we asked him a question!
What are the five qualities that great PhDs and great artists share?