Quick word on predicting flow directions

In this post, I’ll briefly describe some nice results of the prediction of the directions in a flow field based from a work I did jointly with Jes Frellsen and Rich Turner. They are quite simple and more sophisticated settings can easily be conjured but can add unnecessary further complexity to understand the key concepts involved. I’ll update this post later to have more examples with other cool examples.

When you perform actual measurements, like when measuring wind flow or maritime currents, you don’t necessarily perform them on a equidistant grid. Actually, sometimes geographic constraints may scatter your measurements like the output of some form of tessellation algorithm, where some measurements are available and others are yet to be obtained.

This can be a problem for a variety of methods, but not for kernel-based methods like Gaussian Processes. But if we are worried about the direction of the flows, then even Gaussian Process can provide some poor results — because they can’t really model the topology of a directional field. The best way to get around this is to use a kernel-based method that has calibrated uncertainty estimates on directions. One of the possible ways to tackle this issue is to use a multivariate Generalised von Mises, or mGvM for short (and yes, this is a plug for my own work on circular distributions). This special distribution can account for measurements on a regular grid or in other non-grid structure. In this post, I’ll show a simple example on a grid just to illustrate some simple desirable model properties that are uncomplicated. I will show a more complex flow under a irregular grid structure, as well as the calibration of the uncertainty estimates, in a second post.

How well does this perform? Well, have a look at these pictures below, where the data points are in black and the model predictions are in red.

Yup, predictions right on!
Figure 1: Toy example on a flow field with the right-top quarter of the field removed. The predictions where there was no training in red. The predictions match the original flow field.

Ok, that was interesting, but it’s not that difficult of a task. We can come up with several different explanations, but basically we can say that the top quarter is constrained by the top-left and bottom-right boundaries with the data (and you could maybe argue that there’s a substantial amount of data).

So what would happen if we made it a little more difficult for model by only supplying the left side of the picture and completely removing the data available to train to the right. What would the algorithm do when predicting the right zone, where it had no data?

The predictions are still sensible!
Figure 2: Available data for learning (black) and predictions from the learned field where there was no data (red). While the predictions are not a total match for the original field, they are very close.

Those results were quite interesting: the flow pattern learned is similar to those from the original field despite the scarcity of data in the right side. Sure, if we had maybe a couple points there, the result could be many times better.

To exemplify the former point that if we had some other data to give a bit more of structure for the prediction and also to investigate the power of this model, let’s go ahead and give our model a small bit of data comprising of the bottom left quarter of the picture and asking it to predict all the remaining three quarters. If we were to guess what the field looked like based only on that quadrant, we’d probably say something like this:

Yes, I bet you'd have guessed something like the model - I did!
Figure 3: Model predictions under low-data availability. In the absence of data to learn the complex flow structure, the model still yields a very sensible response.


So the interesting thing is that the flow field is still sensible with the data given.

This short demo exemplifies some of the nice properties of models that can use the multivariate Generalised von Mises and its inference algorithms. There are many other applications for this distribution other than flow fields (phase content in audio and in images, anyone?) and other ways to predict flow fields (which is another topic I’m interested in).

Updated presentations

I have added the presentation on “Probabilistic Data Structures and Algorithms” that Christian Steinruecken and I have prepared for the Machine Learning Reading Group in 2013.

It’s very interesting material, especially if you don’t know where to start on data structures for scalable machine learning and very large datasets – i.e big data, if you like buzz-words.

If this topic caught your eye, you should have a look at prof. Alex Smola’s blog entry on Bloom filters, his talk on data streams and Hash Kernels by Qi et al (2009).

New content and presentations

I just added a new section where you can check out some of the presentations I’ve made during the last few years. Hopefully that will be of interest to some people.

I have also completely changed the site layout to a blog format so I may make posts concerning my research and other topics of interest over the years.

Hope you like it! 😀