In this post, I’ll briefly describe some nice results of the prediction of the directions in a flow field based from a work I did jointly with Jes Frellsen and Rich Turner. They are quite simple and more sophisticated settings can easily be conjured but can add unnecessary further complexity to understand the key concepts involved. I’ll update this post later to have more examples with other cool examples.

When you perform actual measurements, like when measuring wind flow or maritime currents, you don’t necessarily perform them on a equidistant grid. Actually, sometimes geographic constraints may scatter your measurements like the output of some form of tessellation algorithm, where some measurements are available and others are yet to be obtained.

This can be a problem for a variety of methods, but not for kernel-based methods like Gaussian Processes. But if we are worried about the direction of the flows, then even Gaussian Process can provide some poor results — because they can’t really model the topology of a directional field. The best way to get around this is to use a kernel-based method that has calibrated uncertainty estimates on directions. One of the possible ways to tackle this issue is to use a multivariate Generalised von Mises, or mGvM for short (and yes, this is a plug for my own work on circular distributions). This special distribution can account for measurements on a regular grid or in other non-grid structure. In this post, I’ll show a simple example on a grid just to illustrate some simple desirable model properties that are uncomplicated. I will show a more complex flow under a irregular grid structure, as well as the calibration of the uncertainty estimates, in a second post.

How well does this perform? Well, have a look at these pictures below, where the data points are in **black** and the model predictions are in **red**.

Ok, that was interesting, but it’s not that difficult of a task. We can come up with several different explanations, but basically we can say that the top quarter is constrained by the top-left and bottom-right boundaries with the data (and you could maybe argue that there’s a substantial amount of data).

So what would happen if we made it a little more difficult for model by only supplying the left side of the picture and completely removing the data available to train to the right. What would the algorithm do when predicting the right zone, where it had no data?

Those results were quite interesting: the flow pattern learned is similar to those from the original field despite the scarcity of data in the right side. Sure, if we had maybe a couple points there, the result could be many times better.

To exemplify the former point that if we had some other data to give a bit more of structure for the prediction and also to investigate the power of this model, let’s go ahead and give our model a small bit of data comprising of the bottom left quarter of the picture and asking it to predict all the remaining three quarters. If we were to guess what the field looked like based only on that quadrant, we’d probably say something like this:

So the interesting thing is that the flow field is still sensible with the data given.

This short demo exemplifies some of the nice properties of models that can use the multivariate Generalised von Mises and its inference algorithms. There are many other applications for this distribution other than flow fields (phase content in audio and in images, anyone?) and other ways to predict flow fields (which is another topic I’m interested in).