I was wondering what sort of performance people were getting out of their LB implementations as a guideline as to what I should be aiming for. I notice some benchmarks with OpenLB at http://www.lbmethod.org/openlb/benchmarks.html and was wondering if there were any other benchmarks available… maybe for slightly smaller systems or for implementations on GPUs.
If you google for “lattice Boltzmann benchmark”, or something similar, you will find that there exists an incredibly huge amount of papers which report performance benchmarks with LB, on all imaginable types of machines, including GPUs and Cell processors. Sometimes, I almost get the impression that there are more people trying to optimize codes than physicists or engineers trying to actually use the method …
As a personal recommendation, I would say that although you’ll find papers which report impressive results on specific hardware, these are probably not the values with which you want to compare your own code. As soon as you are subject to constraints of everyday life, such as, the possibility to include various physical ingredients in your code, or the portability of the code between different platforms, you rapidly end up with more modest performance.
A typical case in point are GPUs. Although the theoretical performance peaks you can obtain with a GPU implementation of LB are incredibly high, starting to play around with GPUs is in general not what you want to do (depending on your scope, of course). Given that the programming environments for this type of hardware are quite exotic, you risk to invest a substantial amount of your PhD into GPU programming, rather than solving actual physics.
Instead, I would suggest that you focus on papers with benchmark results embedded in the presentation of a simulation which is interesting in its own right due to its physical content. An example is the following article.