Parallelizing Reaction between Neighbors

Arpon · July 15, 2010, 4:00pm

Hello,

In trying to get my simulations to run in parallel, I’ve put almost all of my methods into data processing functionals. However, there is one set of methods, a reaction between some neighboring particles, that doesn’t require an iteration through some domain, but an iteration though a predefined list of neighbors. Basically I keep a vector of pairs, where the pairs consist of particles (x, y, z) and the directions in which the relevant neighbors are. My question is how I can make this reaction effectively parallel: it seems that for each node in the cluster, I need to know which part of the lattice is being calculated, so that rather than each node calculating the reaction for the whole system, it only calculates the reaction for the sub-domain it is responsible for. I figure there has to be some way to do this (is the lattice split up spatially as a grid, based on the number of clusters and rank?), since this must be done for normal data processing functionals and collideAndStream in parallel. Thanks in advance.

–Arpon

jlatt · July 19, 2010, 4:05pm

Hi,

Classical data-processors get their argument in form of a Box2D or Box3D, the rectangular domain on which they act. There exists however a variant thereof which works on lists of cell nodes. These data-processors must inherit from “DotProcessingFunctional” instead of “BoxProcessingFunctional”. There’s a very simple example which illustrates the usage of this type of data processors, in examples/codesByTopic/dotList .

I hope this helps, otherwise don’t hesitate to ask again.

Arpon · July 19, 2010, 11:19pm

Hi, thanks for the help. The DotProcessingFunctional is in the direction that I need to go, but I need to somehow pass in more than just the points. For each point in my list I have a vector of ints as well, and that also needs to get into the functional. I tried creating a class that inherits from Dot3D, but that didn’t work because I couldn’t retrieve the extra information after it had been casted as a Dot3D in the DotList. What would be the best way to do this, without losing efficiency?

Arpon · July 20, 2010, 7:18pm

Looking through the code, it doesn’t seem that I can pass in extra information to these functionals. So I guess the task is to figure out which processors are handling which sections of the lattice so that I can split up the reaction between the processors and make sure only the reactions in the relevant sections of the lattice are being calculated by each processor. I tried doing this with a MultiBlockDistribution, but it seems pretty slow. Since Palabos can handle this better, I need some way to retrieve which subdomain of the lattice is being handled by which processor. Is there some variable or method that provides this information?

jlatt · July 21, 2010, 9:14am

From your explanation, I guess that there is nothing yet in Palabos which solves your problem of the box. This means that you must do some manual implementation. But be careful not to go to a too low level, and be sure to let Palabos handle things like parallelism automatically.

Here’s the idea.

In Palabos, data-processors represent algorithms, and multi-blocks represent containers. This means that whenever you want to store data (a vector of ints in your case), the data must go into a multi-block. In many cases you’ll allocate various multi-blocks and couple them together through data-processors. A good example is the He/Lee multi-phase model which is implemented in the newest Palabos release. It allocates many multi-blocks for the populations, for the pressure, for the chemical potential, etc. and couples them together appropriately.

There are multi-blocks for various purposes, but not for your case, as it seems. You therefore should go for the generic multi-block, the MultiContainerBlockXD. You can essentially put whatever data you want into a MultiContainerBlockXD, and Palabos just takes care of the parallelization. This means that if you couple a MultiContainerBlockXD with, say, a MultiBlockLatticeXD, your data-processor will get, as argument, a ContainerBlock-BlockLattice couple which is located on the same processor, and you then do what you need to do with the two of them.

As an example, I suggest that you have a look at the class StoreDynamicsFunctionalXD in dataProcessingFunctional/metaStuffFunctionalXD .h and .hh (and the usage of this class is shown correspondingly in metaStuffWrapperXD .h and .hh). This data-processor parses a block-lattice, identifies the ID of the dynamics (BGK, Bounce-Back, etc.) and stores a unique copy of these IDs inside a container-block, in a vector of ints just like in your case.

By the way, once you go for the container-block, you may decide to use the classical BoxProcessingFunctional instead of DotProcessingFunctional, and store the positions manually inside your container-block.

I hope this explanation is useful and helps you with your implementation, although it requires a bit more technical work than usually. Just let us know how thing are evolving, I am curious to know how easy/difficult it is for a non project-member to develop new stuff, and how we could improve the interface.

Arpon · July 25, 2010, 10:39pm

This worked perfectly, and it wasn’t too difficult to work with this generic processor/structure. I don’t know if I just missed it, but this should definitely be included in the user’s guide. Something this generic was really useful, and will be useful in the future since the LBM and Palabos are so extendable. Thanks so much.

Arpon · July 26, 2010, 7:59pm

So, I thought I had this all sorted out, but I’ve run into a problem. Namely something unexpected is happening at the place at which the lattice is split between the two processors (or more splits where there are more than two processors). Basically, in my reaction, I am taking out heat from one cell, and then distributing it among some of its neighbors. For some reason, it seems as if this reaction process isn’t happening where the lattice splits in the data processing functional.

Here is an image: http://img405.imageshack.us/img405/3159/splitd.png

I thought at first that this had to do with the bulk vs. bulkAndEnvelope within the functionals, but when I tried changing the functional to bulkAndEnvelope (it was originally just bulk), the problem didn’t go away. Does anybody know why this is happening?

Arpon · July 27, 2010, 3:54pm

Just some new information:

So I pass in two lattices to the functional (when I have just bulk; I tried splitting up the reaction into two stages and using bulkAndEnvelope with only one lattice in each stage, but the following problem persists), and from one lattice heat is being taken away and in the other lattice heat is being absorbed. It seems as if the heat is being taken away just fine, the density/temperature is updated outside of the functional, but the same is not so for the second lattice. Within the functional, the heat is absorbed just fine, but outside of the functional when I check, the density/temperature of the absorbing cell hasn’t changed. Is there some special way I have to treat updates to the second lattice?

Arpon · July 29, 2010, 6:27pm

problem solved; I thought only the second stage of my reaction needed bulkAndEnvelope, but it turns out that both stages did.

Arpon · July 31, 2010, 9:50pm

problem not solved; an error an image output misled me. the division is still there, any help?

Arpon · August 2, 2010, 3:25pm

So I believe that the problem has to do with the envelope that is passed into the functional. For some reason, the envelope is only extended in the y and z directions, and not in the x direction, which is why the division is appearing (that’s halfway in the x-direction). Why is this happening, or how could I fix it?

EDIT: Nope, I was mistaken again. This is not the problem. I only thought this because the x envelope wasn’t treated as a separate domains like the y and z were. I’m still thoroughly perplexed about what is happening.

Arpon · August 3, 2010, 8:27pm

This problem is becoming more and more confusing. I’ve been playing around with things, and now, for some reason my functionals that have bulkAndEnvelope for the appliesTo method don’y have the envelope passed in as domain(s). So I can’t act on the envelope directly without the domains, right? I don’t know why they’ve suddenly disappeared, because I don’t think I’ve changed any of my code too much, and it shouldn’t really affect that.

EDIT: I’ve fixed the bulkAndEnvelope thing; it was how I was instantiating my MultiContainerBlocks (nx, ny, nz parameters rather than a lattice parameter). But I’m still lost as to why heat is being lost at that division. Within the functional, it seems that the heat is being transfered correctly, but it isn’t.

jlatt · August 12, 2010, 10:27am

Hello,

I know that at this point it is not easy to develop complicated data processors, because we are lacking corresponding documentation. As for everything else, we just need to find the time to do it.

It should not be too difficult to fix your code if we step through the whole thing carefully. If I understand things right, you are accessing simultaneously two lattices and a container-block through a data processor, right? In order to find the proper design of your code, you should mention the exact access pattern of your algorithm with respect to (1) read/write access and (2) local/non-local access. For example, your algorithm could look like:

Non-local read from lattice 1 (non-local means: nearest-neighbor access, for example to compute gradients through finite differences).
Local read from lattice 2.
Non-local write to lattice 2.

Could you specify your algorithm in a similar way? It might be that you simply need to split everything into two different data processors and then everything is fine.

As a general rule, you should go for appliesTo->bulk only. Acting on the bulk and the envelope is an optimization which has its limitations and is somewhat tricky to use.

Arpon · August 16, 2010, 3:11pm

So, like you said, two lattices and a container-block are being accessed. Only the two lattices are being written, though. (Just a reminder that my container-block is storing the coordinates of a point and the coordinates of a few neighbors that I’m interested in for each point.)

(looped over coordinates in the container-block)

Local read from lattice 1
Local write to lattice 1
(Potentially) non-local read from lattice 2
(Potentially) non-local write to lattice 2

When I tried using bulkAndEnvelope, I split up steps 1 and 2 and steps 3 and 4 into two different data processors since only one lattice can be written.

jlatt · August 17, 2010, 3:14pm

It is not yet clear to me where the problem in your code is located. However, let me point out the three following facts which in my experience often prevent problems of the kind.

As mentioned before, you should go for the appliesTo->bulk version, as the rules for writing error-free code are easier in this case.
If you write a lattice, you must let Palabos execute a communication step before you read data from the same lattice in a potentially non-local way. This is necessary in order for Palabos to update the variables in the envelope. The only way to obtain such a communication step is to have the write-operation and the subsequent read-operation in two different subsequent data processors. If the data processors are executed manually (applyDataProcessor) then you’re fine: the communication step is automatically executed. If the data processors are internal (integrateDataProcessor), you must make sure the two data processors are not added at the same level. For example, you should add the first processor at level 1 and the second processor at level 2, because communication is only executed across different levels, for efficiency reasons.
It might be that the problem simply comes from the initialization. Make sure lattice.initialize() is executed on each lattice before you start the simulation.

I hope this helps. If not, let us know, and we will look more carefully into your code.

Arpon · August 24, 2010, 3:45pm

So I actually changed the implementation of the reaction for the model. It is no longer easy to see the problem (of heat being lost between the divided sections of the lattice) in the images, but I can still see it numerically since heat is being lost in the system (even though I’m using adiabatic boundaries in the x and periodic in the y and z right now). I tried changing the old reaction to bulk, and splitting up the reading and writing into two processors, but that did not work. The same problem is happening with the new reaction implementation. Below is the relevant code that you need to see to understand the reaction process, commented for clarity. Thanks for all your help, hope this gives you some insight on the problem.


// neighbor data has the central point (x, y, z) and a vector of any of the 6 relevant directions that contain interesting neighbors
// so if x = 5, y = 5, z = 5, and adj = {0, 2, 3}, we know that when processing the point (5, 5, 5) to process the reaction in the 0th, 2nd, and 3rd directions (dx [i], dy [i], dz [i] store the delta_x, delta_y, and delta_z for the ith direction, used below)
struct neighbor
{
   plint x, y, z;
   vector <plint> adj;

   neighbor (plint x_, plint y_, plint z_, vector <plint> adj_)
      : x (x_), y (y_), z (z_), adj (adj_)
   { }
};

vector <neighbor> neighbor_list; // list of points and their neighbors
vector <vector <T> > reaction_temps; // for each central point, the temperature I need to read in from the lattice 

// read data from lattice that is needed for heat-transfer reaction
struct reaction_read : public BoxProcessingFunctional3D_LL <T, AD, T, AD>
{
   reaction_read ()
   { }

   virtual void process (Box3D domain, BlockLattice3D <T, AD>& lattice1, BlockLattice3D <T, AD>& lattice2)
   {
      Dot3D abs = lattice1.getLocation ();
      reaction_temps.clear (); // reset data storage vector
   
      for (plint i = 0; i < (plint) neighbor_list.size (); i++) // for all central points
      {
	 vector <T> temp; // list of temperatures for this central point
	 
	 plint fx = neighbor_list [i].x, fy = neighbor_list [i].y, fz = neighbor_list [i].z; // central point (fx, fy, fz)
	 temp.push_back (lattice2.get (fx, fy, fz).computeDensity ()); // the temperature of the central point
	 
	 for (plint j = 0; j < (plint) neighbor_list [i].adj.size (); j++) // for each neighbor 
	 {
	    plint px =  fx + dx [neighbor_list [i].adj [j]], // neighbor (px, py, pz)
	          py = (fy + abs.y + dy [neighbor_list [i].adj [j]] + ny) % ny - abs.y, // y and z are periodic, so convert to global
	          pz = (fz + abs.z + dz [neighbor_list [i].adj [j]] + nz) % nz - abs.z; // and then convert back to local coordinates
	    
	    temp.push_back (lattice1.get (px, py, pz).computeDensity ()); // temperature of neighbor
	 }

	 reaction_temps.push_back (temp); // add list for this central point to main data storage vector
      }
   }

   virtual reaction_read* clone () const
   {
      return new reaction_read (*this);
   }

   virtual void getModificationPattern (vector <bool>& isWritten) const
   {
      isWritten [0] = false;
      isWritten [1] = false;
   }

   virtual BlockDomain::DomainT appliesTo () const
   {
      return BlockDomain::bulk;
   }
};

// write the lattices, updating temperatures for heat-transfer reaction
struct reaction_write : public BoxProcessingFunctional3D_LL <T, AD, T, AD>
{
   reaction_write ()
   { }

   virtual void process (Box3D domain, BlockLattice3D <T, AD>& lattice1, BlockLattice3D <T, AD>& lattice2)
   {
      Dot3D abs = lattice1.getLocation ();
      
      Array <T, 3> u (0.0, 0.0, 0.0);
   
      for (plint i = 0; i < (plint) neighbor_list.size (); i++) // for all central points
      {
	 plint fx = neighbor_list [i].x, fy = neighbor_list [i].y, fz = neighbor_list [i].z; // central point (fx, fy, fz)
	 
	 T transfer_total = 0.0; // keep track of all heat that will be added/subtracted to the central point
	 
	 for (plint j = 0; j < (plint) neighbor_list [i].adj.size (); j++) // for all neighbors
	 {
	    plint px =  fx + dx [neighbor_list [i].adj [j]], // neighbor (px, py, pz)
	          py = (fy + abs.y + dy [neighbor_list [i].adj [j]] + ny) % ny - abs.y, // y and z are periodic, so convert to global
	          pz = (fz + abs.z + dz [neighbor_list [i].adj [j]] + nz) % nz - abs.z; // and then convert back to local coordinates

	    // H = heat transfer coefficient, r..[i][0] = central point temperature, r..[i][j + 1] = neighbor temperature            
	    T transfer_local = H * (reaction_temps [i][0] - reaction_temps [i][j + 1]); // transfer based on temperature gradient

            // update neighbor temperature
	    lattice1.get (px, py, pz).computeVelocity (u);
	    iniCellAtEquilibrium (lattice1.get (px, py, pz), reaction_temps [i][j + 1] + transfer_local, u); 

	    // update total heat transfer to central point
	    transfer_total += transfer_local;
	 }

	 // update central point temperature
	 lattice2.get (fx, fy, fz).computeVelocity (u);
	 iniCellAtEquilibrium (lattice2.get (fx, fy, fz), reaction_temps [i][0] - transfer_total, u);
      }
   }

   virtual reaction_write* clone () const
   {
      return new reaction_write (*this);
   }

   virtual void getModificationPattern (vector <bool>& isWritten) const
   {
      isWritten [0] = true;
      isWritten [1] = true;
   }

   virtual BlockDomain::DomainT appliesTo () const
   {
      return BlockDomain::bulk;
   }
};

drawnnow · May 28, 2011, 7:18pm

Dear jlatt,

Speaking of he/lee multiphase model, do you have any example of using this model?
Any suggestions would be appreciated.
Thanks.

jlatt Wrote:

From your explanation, I guess that there is
nothing yet in Palabos which solves your problem
of the box. This means that you must do some
manual implementation. But be careful not to go to
a too low level, and be sure to let Palabos handle
things like parallelism automatically.

Here’s the idea.

In Palabos, data-processors represent algorithms,
and multi-blocks represent containers. This means
that whenever you want to store data (a vector of
ints in your case), the data must go into a
multi-block. In many cases you’ll allocate various
multi-blocks and couple them together through
data-processors. A good example is the He/Lee
multi-phase model which is implemented in the
newest Palabos release. It allocates many
multi-blocks for the populations, for the
pressure, for the chemical potential, etc. and
couples them together appropriately.

There are multi-blocks for various purposes, but
not for your case, as it seems. You therefore
should go for the generic multi-block, the
MultiContainerBlockXD. You can essentially put
whatever data you want into a
MultiContainerBlockXD, and Palabos just takes care
of the parallelization. This means that if you
couple a MultiContainerBlockXD with, say, a
MultiBlockLatticeXD, your data-processor will get,
as argument, a ContainerBlock-BlockLattice couple
which is located on the same processor, and you
then do what you need to do with the two of them.

As an example, I suggest that you have a look at
the class StoreDynamicsFunctionalXD in
dataProcessingFunctional/metaStuffFunctionalXD .h
and .hh (and the usage of this class is shown
correspondingly in metaStuffWrapperXD .h and .hh).
This data-processor parses a block-lattice,
identifies the ID of the dynamics (BGK,
Bounce-Back, etc.) and stores a unique copy of
these IDs inside a container-block, in a vector of
ints just like in your case.

By the way, once you go for the container-block,
you may decide to use the classical
BoxProcessingFunctional instead of
DotProcessingFunctional, and store the positions
manually inside your container-block.

I hope this explanation is useful and helps you
with your implementation, although it requires a
bit more technical work than usually. Just let us
know how thing are evolving, I am curious to know
how easy/difficult it is for a non project-member
to develop new stuff, and how we could improve the
interface.