using MPI library on a quadcore desktop

I just installed the OpenMPI-library to use all 4 cores for my 3D simulations.

Although everything compiles now and the simulation is running atm, there is no real parallel computation going on. The main load is still on one core - which is at 100%, and a second one is at 20%, the others are at <10%. Sometimes the main core switches, but thats the only difference to the single core case.

Do I need any special compiler flags, or other ways to improve the load for the other CPUs ?


Is your simulation doing frequent I/O operations? This certainly explains the bad performance; if so, comment out the I/O or make it less frequent. Also, have a look at the hints for improving parallel performance in the user’s guide:

About these internal statistics, so it would be best if the code looks like:

double dx = parameters.getDeltaX();
double dt = parameters.getDeltaT();
for (plint iT=0; iT<30000; ++iT) {
        if (iT%999==0) {
            pcout << "Writing image at dimensionless time " << iT*parameters.getDeltaT() << endl;
	    VtkImageOutput3D<double> vtkOut(createFileName("vtk", iT, 6), dx);
	    vtkOut.writeData<3,float>(*computeVelocity(lattice), "velocity", dx/dt);
	    pcout << computeAverageEnergy(lattice) << endl;
        // Lattice Boltzmann iteration step.

So I have to call lattice.collideAndStream(); an additional time when the internal statistics flag is changed ?
Thanks for the other hints to improve the performance, I’ll reduce the number of images - but the main reason it performed so bad on the other cores was because I don’t like to read manuals :slight_smile:

I stopped reading right after I knew how to compile everything, and missed the part with the mpirun :slight_smile:

Now I use it :slight_smile: and all 4 cores are working at maximum load most of the time :slight_smile:


So I have to call lattice.collideAndStream(); an additional time when the internal statistics flag is changed ?

Yes right: you need to (1) switch on the flag, (2) execute collideAndStream(), (3) get the result from the statistics, and (4) switch off the flag again.

But note that the thing about improving parallel performance when the statistics are switched off is something we noticed on large parallel machine with hundreds or thousands of cores. I’d guess that it doesn’t matter on your quad-core.