lattice boltzmann and data compression


I am writing a GPU LBM implementation, using the thesis “AB INITIO DERIVATION OF THE CASCADED LATTICE BOLTZMANN AUTOMATON” by Martin Geier as my guide.

The above-mentioned thesis describes a D3Q27 model, the velocities of which can be represented using integers.

It struck me that the information in a D3Q27 model might be amenable to simple lossless data compression methods, for example, operating on blocks of cells, storing the min and max value in each of the directions explicitly, and storing the actual values using the appropriate number of bits for placing them in the (min,max) range.

Given the arithmetic intensity of the approach described in the thesis above, decompressing a 4x4x4 block of cells would be a relatively quick operation. If the compression was simple enough, accessing arbitrary cells in adjacent blocks would also be a relatively simple operation.

Has this approach been explored at all?

Thanks in advance,

Damien Morton

Hi Damien,

I have never heard of anybody using run-time data compression with LB. The only exception is the trick by P.A. Skordos (which is documented here) that consists in subtracting the lattice weight t_i from each particle population. This gains significant digits, and you may be able to switch from double-precision to single-precision arithmetics without loss of accuracy. You can therefore view this as a simple form of lossless data compression.

To get back to your idea, which I actually find very interesting: I think that you need to have data which is strongly correlated in space in order to shrink its size significantly with this type of algorithm. Strong local coherence is certainly not granted when you simulate a turbulent flow. Here, the changes from a lattice site to its neighbor are of the same order of magnitude, or only a few orders of magnitude smaller, than value of the data itself. Even in laminar flows, I wonder if the correlation between lattice sites is sufficiently strong. The exception may be simulations at very high resolution, like the ones used for high-precision benchmark runs. Indeed, you decrease the gradient of the flow variables between adjacent lattice sites when you increase the resolution.

But this is just my gratuitous opinion, as I have never experimented with data compression in fluids. Maybe somebody else has a more knowledgeable opinion?

Actually, I remember having tried to compress data files which I get from simulations of turbulent flows with gzip, but this never works. This may sound a bit naive (of course this doesn’t work, how would gzip know about the structure of a flow?), but then again, the zip algorithm does try to exploit local coherence of the data, doesn’t it?

The application I have in mind for LBM is modeling airfoils and aircraft of various geometries. Given the low-resolution 2D airfoil simulations I have seen, there appear to be large areas of relatively coherent flow. In fact, this is something of a goal when designing airfoils.

For highly turbulent flow, for example, flows through a filter and suchlike, the compression scheme I outlined wouldn’t be very effective.

Its interesting that you mention that increasing the resolution tends to increase coherency - there might be some kind of golden circle here in which compression increases resolution which increases coherency which increases compression which …

General purpose compression algorithms such as zip and gzip work by identifying sequences of bits or bytes that are repeated in the data stream and replacing them with compactly encoded symbols. This works great for text, but not so well for floating point data.

You might find that zip and gzip would work better if you organized your data into planes, with the low-order 8-bits of your data organised into one plane, the next 8 bits into the next plane, etc, and the exponent into another plane. You would also want to organise your data for 1D, 2D, and possibly 3D spatial coherency using some kind of space-filling curve (e.g. a Hilbert curve). After all this, you would tend to find that the higher order bit planes would compress better than the lower order ones (which might not compress at all), giving a relatively poor compression rate overall.

OK, I get your idea with reorganizing data in a way for which standard compression algorithms become more effective. I think that this would actually be a nice thing to try, additionally to your idea of run-time data compression. People would certainly be happy over the possibility to reduce storage requirements for CFD data.

When I said that your data has stronger local correlation at high resolution, I was referring to a simple arithmetic argument. Let’s call df/dx the gradient of a variable, as measured in a lattice independent system of units (for example in the units of a real physical system), and DeltaX the distance between two adjacent lattice sites, in the same system of units. Then, the value of f between two adjacent lattice sites roughly differs by

f(x+DeltaX)-f(x) \approx df/dx DeltaX

But if you increase the lattice resolution by a factor two (DeltaX --> DeltaX/2), this becomes

f(x+DeltaX/2)-f(x) \approx 1/2 (df/dx DeltaX) ,

which means, the difference between two adjacent lattice sites has become smaller by the same factor.

The more sophisticated data compression algorithms rely on a predictive model, efficiently encoding the difference between the predicted value and the actual value, with the predicted value being derived from some context - be it adjacent values or previous values.

I am something of a novice at LBM, but for example, in a D3Q27 model, arent the velocities ascibed to each direction strongly correlated to each other.

Do you mean that f[sub]1[/sub] is correlated to f[sub]2[/sub] on a given lattice site, or that f[sub]1[/sub] on a given site is correlated to f[sub]1[/sub] on a neighboring node?

If the forces at a given site are f[sub]1[/sub] through f[sub]27[/sub], then yes, my question was whether or not they are correlated to each other, e.g f[sub]1[/sub] to f[sub]2[/sub] at the same site.

For example, if the Northwards direction has a high value, might the Southwards direction tend to have a low or negative value. If the NE and NW directions have high values, might there be a high probability that the N direction also has a high value.

I am picturing that the direction and ‘length’ of each of the direction vectors defines an envelope, and that the general shape of this envelope might be able to be approximated with a smaller amount of information than 27 floating-point values, and then corrected.

I am to write a cascaded LBM in 2D. But I have some problems. Would you give me a favor if you have one and?