As always, it is quite difficult to draw clear conclusions from benchmarks, because the real value of a code depends on much more than a single performance number. Just as important are how well it is adapted to the problems you investigate, how user-friendly, and how portable it is, to just mention a few points. Therefore, all I am going to write should we appreciated with care.
OpenLB is based on a regular-matrix philosophy, which is opposed to the individual link-tracking philosophy, as the one in the article you mention. Actually, in the early days of OpenLB back at the University of Geneva, there was another LB code circulating, which is written by Alexandre Dupuis, and which uses individual link-tracking ( http://dx.doi.org/10.1016/S0167-739X(99)00130-2 ). We have unfortunately never run comparison benchmarks, as Alexandre left the group approximately when I arrived, so this stays an open question.
Obviously, the matrix-based formalism is well adapted to modern cache-based processors, and is very efficient in regular geometries. Of course, a link-based formalism becomes interesting in a porous media with only few fluid nodes, but my guess is that for this advantage to be significant, the media must be very porous. My intuition is that you can easily lose an order of magnitude of efficiency by abandoning a matrix-based formalism, whereas the gain from reducing the number of active nodes in a link-based formulation gains you a few factors at best.
I’ve had a quick and not-very-careful look at the article you’ve cited, and here’s how I read Fig. 6 (please tell me if I misread it). The authors are benchmarking a porous and a non-porous media confined in a geometry which, at its largest, has a size of 80^3. Let’s focus on this case and compute the number of site updates per second they reach, taking into the calculation all lattice sites, not only fluid nodes.
In a non-porous media they observe 80^3*100/600/1e6 = 0.09 mega site updates per second, no matter what code they use.
In a porous media, the regular code again reaches 0.09 mega su/s, whereas the link-based one goes up to 0.26 mega su/s. The conclusion is a speed up of more than 2, which seems to be in favor of the link-based approach.
But as I am running OpenLB on an Intel which was bought in 2003 (this should make it comparable to the machine used in the article), I get a performance of over 1 mega su/s in a regular 80^3 geometry, with the D3Q19 lattice. That’s four times more. A conclusion here could be that the link-based approach is not really fast, it’s just that the authors compare it with a (very) slow matrix-based code. But as I said, one should never draw too many conclusion from a single benchmark value.