I am glad to announce that the project “from CPU to GPU in 80 days” is now complete, with a fully functional GPU-capable version of Palabos. As a reminder, the goal is to maintain the user interface of Palabos, and to allow identical applications to run either on CPU or on GPU. We implemented this by proposing a new structure, the “AcceleratedLattice”, which implements the features of the “MultiBlockLattice” on GPU. While we don’t port all of Palabos onto GPU within this project, we were able to run all three proposed test cases on GPU.
The test cases are described on the image below:
The performance achieved for each of the three test cases on two multi-core CPUs (using the original Palabos and MPI parallelism) and on a NVIDIA RTX 3090 GPU (a recent gaming GPU) are shown below:
The dotted lines correspond to the performance reached by the simple STLBM code for the TGV benchmark, which shows that the Palabos GPU implementation is close to the optimal performance that can be expected from a GPU code based on parallel algorithms.
Don’t hesitate to go ahead and test the three examples. The code is available at
Details of the project, and indications on how to compile are provided in the README.
As next steps, we will
- Benchmark the multi-GPU performance
- Port larger parts of Palabos to GPU
- Push the changes to the main Palabos repository through a merge request
Details of this project will also be presented at the DSFD 2021 conference.
We hope that you will find the outcome of this project useful, and find a way to combine GPU-grade performance with the high-level programming interface of Palabos.