Parallelism & multi-block

Hello everybody,

I have a problem concerning the multi-block-lattice. I want to run the code parallel with several cores and my understanding is, that I just have to use the multi-block-lattice and to change the makefile (turn MPIparallel flag on and change the directory in parallelCXX). And in fact this works usually by using the command “mpirun -np x filename” with x equal to the number of cores.

But now I am working on a new code and instead of splitting the multiblock in several atomic blocks, it just defines one atomic block with a size equal to the multiblock. Finally each core compiles the code so that for example for “mpirun -np 4 filename” I got the output of 4 simulations with the same results instead of one faster simulation.

I know there are several things to avoid for a fast parallel execution like output commands and lattice loops without data processors. But what could be the reason for the missing division in atomic blocks?

Thank you for your help!

Could you post the parts of your code where you create your lattice? This might help in tracking down the error.


Hey thank you for your answer,

but I just found the mistake. I forgot the function plbInit which seems to be also necessary for parallelism.


Hello everbody,

I got now a new problem and would be glad to get some help from you : )

The example below has no deeper sense but represents my problem:
I divide the multiblock in 3 atomic blocks manually and then I want to get the probability densities of one direction (2 in this example). But the program displays only the values of the first atomic block and I don’t know why.

I use only the bulk without the envelope because I don’t want to change dynamics or the internal state. To my unterstanding I do not need global indices in this example because I don’t make space dependent decisions. I don’t change variables, so I use modif::nothing. I tested also modiff::staticVariables.

#include “palabos2D.h”
#include “palabos2D.hh”

using namespace plb;
using namespace std;
typedef double T;
Array<T,2> ustart(0.,0.);

#define DESCRIPTOR plb::descriptors::D2Q9Descriptor

template<typename T, template class Descriptor>
class dataprocessor : public BoxProcessingFunctional2D_L<T,Descriptor>

BlockDomain::DomainT appliesTo() const
{return BlockDomain::bulk;}

void getModificationPattern(std::vector<bool>& isWritten) const

virtual void getTypeOfModification(std::vector<modif::ModifT>& modified) const
modified[0] = modif::nothing;
modified[1] = modif::nothing;
modified[2] = modif::nothing;

virtual dataprocessor<T,Descriptor>* clone() const
{return new dataprocessor<T,Descriptor>(*this);}

virtual void process(Box2D domain, BlockLattice2D<T,Descriptor>& lattice)
    for (plint iX=domain.x0; iX<=domain.x1; ++iX)
        for (plint iY=domain.y0; iY<=domain.y1; ++iY)
            Cell<T,Descriptor>& cell = lattice.get(iX,iY);


int main(int argc, char* argv[])
plbInit(&argc, &argv);

T omega=(T)0.2857;
plint nx=15;
plint ny=1;
plint maxt=3;

plint x0 =5;  
plint x1 =9;  

plint envelopeWidth =1;
SparseBlockStructure2D sparseBlock(nx,ny);

MultiBlockLattice2D<T, DESCRIPTOR> lattice (
    MultiBlockManagement2D(sparseBlock,defaultMultiBlockPolicy2D().getThreadAttribution(), envelopeWidth ),
    new BGKdynamics<T,DESCRIPTOR>(omega)

Box2D PG(0,14,0,0);


for (plint it=0; it<maxt; ++it)
    applyProcessingFunctional(new dataprocessor<T,DESCRIPTOR>(),PG,lattice);

Regards Palabosfan

Hello everybody,

Im still working on a solution for my problem. I would appreciate your help!

Greetings Palabosfan

This is because you use pcout instead of std::cout. pcout only writes on processor 0.


Hey Philippe,

thank you very much for your answer! This information is the solution for my problem.

Now I got a new question (hopefully the last : ) ) concerning parallelism. In one of my codes I sum up a variable in a loop in a data processor. But after the execution of the data processor I don’t get the total sum of all “atomic loops”. I just get for each atomic loop a single sum.

I googled this problem and found the #pragma omp commands (
I have not much knowledge about mpi but in my understanding Palabos implemented mpi in its own way for data processors. So for working with data processors is there a possibilty to get the total sum of all atomic blocks?

Greetings Palabosfan!

I solved the problem by implementing the commands MPI_Reduce and MPI_Bcast.