# Optimization of my first LBM

I’m working on LBM in subsonic flow, i’m just starting to use this method. I’m trying to show the flow inside a flute. In this code i attempt to load a .bmp binary image that has the information about the obstacles. Then i tried to implement the method based on the book of the professor Michael Sukop “Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers” I already run this code in order to simulate a simple pouiseuille flow in a tube (the .bmp image "inlflabi"was matrix of zeros with ones on the first and the last row) and it seem to work fine. But right now i’m trying to simulate the flut, but the matrix is huge and it takes too many to give an answer.

Is there any method to optimizate my code? Please i need some help!!!

By the wy the code is in MatLab, is not very efficient, but for my working on c is not trivial. I’m just an undergraduate electronic engineering student. But I love CDF!!!

Daniel

clear all
close all
clc

%% Image lecture

%% Constants

[LX LY]=size(is_inner);

rho=double(0.001204*ones(LX,LY));

cs2=1/3;

tau=1;

cuanue=4/9;
unue=1/9;
utrei=1/36;

f1=3;
f2=9/2;
f3=3/2;

ex=[0 1 0 -1 0 1 -1 -1 1];
ey=[0 0 1 0 -1 1 1 -1 -1];

x=1:LX;
y=1:LY;

in=circshift(x,[0 1]);
ip=circshift(x,[0 -1]);
jn=circshift(y,[0 1]);
jp=circshift(y,[0 -1]);

rt(x,y,1)=cuanuerho(x,y);
rt(x,y,2)=unue
rho(x,y);
rt(x,y,3)=utrei*rho(x,y);

Nis_inner=~is_inner;

%% Matrix Inicialization

% Inicialization MACROSCOPIC VARIABLES
ux=zeros(LX,LY);
uy=zeros(LX,LY);

%Inicialization PROPAGATION
f=zeros(LX,LY,9);
ftemp=zeros(LX,LY,9);

%Inicialization EQ. DISTRIBUTION FUNCTION
feq=zeros(LX,LY,9);
uxeq=zeros(LX,LY);
uyeq=zeros(LX,LY);

MAXIT=input(‘NUMBER OF ITERATIONS=’);

f(:,:,1)=rt(:,:,1).*Nis_inner;

f(:,:,2)=rt(:,:,2).*Nis_inner;
f(:,:,3)=rt(:,:,2).*Nis_inner;
f(:,:,4)=rt(:,:,2).*Nis_inner;
f(:,:,5)=rt(:,:,2).*Nis_inner;

f(:,:,6)=rt(:,:,3).*Nis_inner;
f(:,:,7)=rt(:,:,3).*Nis_inner;
f(:,:,8)=rt(:,:,3).*Nis_inner;
f(:,:,9)=rt(:,:,3).*Nis_inner;

for iter=1:MAXIT

%% Macroscopic variables, Eq. Distribution Function & Colision

``````for i=1:LX
for j=1:LY

if ~is_solid(i,j)

if Nis_inner(i,j)

rho(i,j)=sum(f(i,j,:));
ux(i,j)=sum(ex.*reshape(f(i,j,:),[1,9,1]))/rho(i,j);
uy(i,j)=sum(ey.*reshape(f(i,j,:),[1,9,1]))/rho(i,j);

end

uxeq(i,j)=ux(i,j);
uyeq(i,j)=uy(i,j);

rt(i,j,1)=cuanue.*rho(i,j);
rt(i,j,2)=unue.*rho(i,j);
rt(i,j,3)=utrei.*rho(i,j);

uxsq=uxeq(i,j)*uxeq(i,j);
uysq=uyeq(i,j)*uyeq(i,j);

uxuy6=uxeq(i,j)+uyeq(i,j);
uxuy7=-uxeq(i,j)+uyeq(i,j);
uxuy8=-uxeq(i,j)-uyeq(i,j);
uxuy9=uxeq(i,j)-uyeq(i,j);

usq=uxsq+uysq;

feq(i,j,1)=rt(i,j,1)*(1-f3*usq);

feq(i,j,2)=rt(i,j,2)*(1+f1*uxeq(i,j)+f2*uxsq-f3*usq);
feq(i,j,3)=rt(i,j,2)*(1+f1*uyeq(i,j)+f2*uysq-f3*usq);
feq(i,j,4)=rt(i,j,2)*(1-f1*uxeq(i,j)+f2*uxsq-f3*usq);
feq(i,j,5)=rt(i,j,2)*(1-f1*uyeq(i,j)+f2*uysq-f3*usq);

feq(i,j,6)=rt(i,j,3)*(1+f1*uxuy6+f2*uxuy6*uxuy6-f3*usq);
feq(i,j,7)=rt(i,j,3)*(1+f1*uxuy7+f2*uxuy7*uxuy7-f3*usq);
feq(i,j,8)=rt(i,j,3)*(1+f1*uxuy8+f2*uxuy8*uxuy8-f3*usq);
feq(i,j,9)=rt(i,j,3)*(1+f1*uxuy9+f2*uxuy9*uxuy9-f3*usq);

f(i,j,:) = f(i,j,:)-( f(i,j,:) - feq(i,j,:))/tau;

else

temp=f(i,j,2);
f(i,j,2)=f(i,j,4);
f(i,j,4)=temp;

temp=f(i,j,3);
f(i,j,3)=f(i,j,5);
f(i,j,5)=temp;

temp=f(i,j,6);
f(i,j,6)=f(i,j,8);
f(i,j,8)=temp;

temp=f(i,j,7);
f(i,j,7)=f(i,j,9);
f(i,j,9)=temp;
end
end
end

guar(:,:,iter)=rho(441:541,1:133);
``````

%% Propagation

``````% C7    C3    C6
%    \      |     /
%       \   |   /
% C4----C1----C2
%       /   |   \
%     /     |     \
%  C8   C5    C9

for i=1:LX
for j=1:LY

ftemp(x,y,1) = f(x,y,1);
ftemp(ip,y,2) = f(x,y,2);
ftemp(x,jp,3) = f(x,y,3);
ftemp(in,y,4) = f(x,y,4);
ftemp(x,jn,5) = f(x,y,5);
ftemp(ip,jp,6) = f(x,y,6);
ftemp(in,jp,7) = f(x,y,7);
ftemp(in,jn,8) = f(x,y,8);
ftemp(ip,jn,9) = f(x,y,9);

end
end
``````

%% Periodic BC’s

``````ftemp(250,180:749,2)=f(950,180:749,2);
ftemp(950,180:749,4)=f(250,180:749,4);

ftemp(250,181:750,6)=f(950,180:749,6);
ftemp(950,181:750,7)=f(250,180:749,7);
ftemp(950,180:749,8)=f(250,181:750,8);
ftemp(250,180:749,9)=f(950,181:750,9);
``````

%% Pressure and velocity BC’s

``````rhoW = 0.0016;
rhoE = 0.001204;
for j = 1:141
f_j = ftemp(1,j, :);
rho_f_j = rho(1,j);

ruW=-rhoW*ux0;                         % Pressure and velocity fixed
f_j(2) = f_j(4) - (2./3.)*ruW;
f_j(6) = f_j(8) - (1./6.)*ruW + (1./2.)*(f_j(5)-f_j(3));
f_j(9) = f_j(7) - (1./6.)*ruW + (1./2.)*(f_j(3)-f_j(5));
ftemp(1,j,:) = f_j;

f_j = ftemp(LX,j, :);
u_x0E = -1. + (f_j(1) + f_j(3) + f_j(5) + 2.*(f_j(2) + f_j(6) + f_j(9)))/rhoE;
ruE = rhoE*u_x0E;

f_j(4) = f_j(2) - (2./3.)*ruE;
f_j(8) = f_j(6) - (1./6.)*ruE + (1./2.)*(f_j(3)-f_j(5));
%f_j(8) = f_j(6) - (1./6.)*ruE + (1./2.)*(f_j(3)-f_j(5)) - (1./4.)*g*rhoE; % hydrostatic PBCs
f_j(7) = f_j(9) - (1./6.)*ruE + (1./2.)*(f_j(5)-f_j(3));
%f_j(7) = f_j(9) - (1./6.)*ruE + (1./2.)*(f_j(5)-f_j(3))- (1./4.)*g*rhoE; % hydrostatic PBCs
ftemp(LX,j,:) = f_j;
end
``````

%% Velocity BC’s

``````rhoN=0.001204;

for i = 27:210
f_i = ftemp(i,LY,:);
u_y0N = -1. + (f_i(1) + f_i(2) + f_i(4) + 2.*(f_i(3) + f_i(6) + f_i(7)))/rhoN;
ruN = rhoN*u_y0N;

f_i(5) = f_i(3) - (2./3.)*ruN;
f_i(8) = f_i(6) - (1./6.)*ruN + (1./2.)*(f_i(2)-f_i(4));
f_i(9) = f_i(7) - (1./6.)*ruN + (1./2.)*(f_i(4)-f_i(2));
ftemp(i,LY,:)=f_i;
end

f(:,:,:)=ftemp(:,:,:);
``````

end

Hello,
I don’t know the size of your domain but it seems that it more or less 1000x1000 (from what I read). You have 32 matrices (27 f_i’s, rho, 4 u’s). You are working in double precision there fore you have to multiply this number by 8 which makes about 3*10^8 bytes of memory. Which depending on the memory available on your computer may be quite big (since matlab by itself already uses some resources)… On the other side on the lbmethod.org matlab doe more or less 0.1 Msu/s (Megasite update per second). Therefore you need 10 sec for one time step. Without too much optimization you have also the c code that computes 1Msu/s, whoch is already an order of magnitude.

There are of course many ways to optimize the code. The most important one is to avoid to jump in memory when doing your loops on the elements of your matrices. You also can try to unroll the loops and stuff like that. But it may be shorter to code in a compiled language.

Thanks… so C will be.

I will translate the code and try it again.