Gathering ScalarField3D Blocks leads program to died loop!

Palabos Version: v1.4r1

I just want to gather flag martrix to the main processor. Below is the code (Just a demo)

#include “palabos3D.h”
#include “palabos3D.hh”

using namespace std;
using namespace plb;

typedef double T;

int main(int argc, char* argv[])
plbInit(&argc, &argv);

    const nx=100;
    const ny=100;
    const nz=100;
    MultiScalarField3D<int> flagMatrix(nx, ny, nz);
        ScalarField3D<T> localFlagMatrix(flagMatrix.getNx(), flagMatrix.getNy(), flagmatrix.getNz())
        copySerializedBlock(flagMatrix, localFlagMatrix);
    //Opearting on localFlagMatrix
    return 0;


The problem is: 
When I using single processor (I mean "-np 1") the code runs normally. 
But when using multiple processor( " -np N" whith N>1), the program seems to run into a died loop.

I traced the program executing route and found the problem maybe lays in the [b]copy function call[/b] in 

MultiScalarField *MultiScalarField3D::clone(MultiBlockManagement3D const& newManagement) const



The file is src/multiBlock/multiDataField3D.hh .