Results of ReductiveBoxProcessingFunctional parallelization-dependent?

Dear community,

I am currently working on an implementation of the immersed boundary method by Noble and Torczinsky [1]. The aim is to couple a DEM simulation to Palabos. To obtain the forces on particles, I need to do sums over space and I need to rely on the fact that these sums do not depend on the underlying domain decomposition of my MultiBlockLattice. I tried implementing the force evaluation by deriving from ReductiveBoxProcessingFunctional, but the results depend on the amount of processes I use. I assume that the origin of this dependence is the (necessary) overlap of the BlockLattices that build the MultiBlockLattice. So my question is: Is there a way to get around this and obtain sums that do not depend on parallelization?