A question about non_Local dynamic

Dear all,
Recently, I want to write a dynamic which is based on vorticity, but I can’t find a class which can solve my question. So I attribute a new (T) value to Cell class, and use BoxProcessingFunctional3D to compute vorticity and assign the vorticity value to the new value in Cell, after that, I use the modified Smagorinsky model which is changed to use the new value in Cell to implement collide step. This works well when there is only one core involved. But when there are multi cores, the computation result is different from the one which is achieved from one core.I have been stuck in this question for so long, and I am really eager for any advise in solving this question.
Thanks for any help,
Guo