The Shape of Data

December 31, 2019

Matthew Zabka, Post-Doc

Hello, dear readers! As one of Laber Labs' newest members, allow me to introduce myself. My name is Matt Zabka, but most in the lab call me Zabka. As I do in most of my introductions and first dates, allow me to ask the following question: do you spend enough time thinking about shapes?

I’m not talking about shapes you can see, which lie in two or three dimensions. Honestly, if you’re spending a lot of time thinking about squares and trapezoids, you need to step back and reevaluate your life priorities. No, I’m talking about the shapes you cannot see: high-dimensional shapes.

Very heuristically, the study of shapes is called topology, and it turns out that this branch of mathematics can be applied to data analysis. This field is called topological data analysis, and this is the area in which I earned my PhD.

Now, if you’re a keen reader—and all readers of the Laber Labs blog are, except my ex-girlfriends—you might ask yourself whether data in high dimensions can have shape at all. After all, does it even make sense to talk about a shape like a circle in four dimensions?

The answer is yes, but even more surprisingly there are many shapes that exist in higher dimensions that cannot exist in lower dimensions that we can see. Perhaps the easiest examples of such shapes are higher-dimensional spheres. In d-dimensions, a d-1-dimensional sphere of radius r is the set of all points that are of distance r from a fixed center point. It is easy to verify that this definition corresponds to a circle in two-dimensional space and a (two-dimensional) sphere in three-dimensional space.

A branch of algebraic topology called homology is very good at detecting spheres. In data analysis, one can employ homology and a method similar to hierarchical clustering to answer—for example—whether high-dimensional data lie on spheres. This branch of data analysis is called persistence homology, and there has been significant progress in understanding this branch of mathematics over the past 15 years.

There are also higher-dimensional shapes that are extremely complex. Understanding how shapes in higher dimensions can twist or invert themselves is very difficult, but if data in higher dimensions lie on such a shape, it could be of interest to an analyst.

While homology is good at detecting spheres, it does not detect complexities of higher-dimensional shapes. Fortunately, there are tools from algebraic topology to analyze such complexities. One such tool is the Steenrod Squares, which are examples of cohomology operations.

These tools have not yet been applied to data. In my short time at Laber Labs, Khuzaima Hameed and I have started a project to analyze the Steenrod squares on topological spaces generated by data. Once Khuzaima and I review the necessary algebraic topology, our investigation will begin with observing under what conditions on the distribution of random data we can see non-trivial Steenrod squares. Keep reading our blog for updates on this project!

Matt is a Post-Doc in Laber Labs. His research interests lie in topological data analysis and random topological spaces. We thought this was a great opportunity to get to know our newest group member better, so we asked him a probing question!

• Suppose you were visited by the Angel of Death and told that you had a week to live but before you bid adieu to your delightful existence on earth you were being given the opportunity to either go back in time to change something (which could change the course of history) or go into the future to effect change, which would you choose, why, and what would you change.

I would rather effect change in the future. If I could look into the future and see what would go wrong without changes in the present, I could work hard to try to fix things. Or, depending on the future problem, maybe I'd decide I couldn't do anything about it and just start partying a lot?