The TLLBM algorithm is almost the same standard LBGK algorithm. I have implemented successfully the TLLBM in 2D and 3D and the process that I found to optimize the algorithm is as follows:

[ol]

Define your mesh

For each lattice in your mesh and all the velocities of your model (e.g. D2Q9) create the matrix S, which contains the coefficients of the Taylor Series expansion

Solve partially the least squares method (S[sup]T[/sup][sub]alpha[/sub]S[sub]alpha[/sub])[sup]-1[/sup]S[sup]T[/sup][sub]alpha[/sub] = A[sub]alpha[/sub], then extract and store the first row of the matrix A for each lattice and all time steps

Execute LBGK to get the collision part of the of neighbor nodes that you select to solve the system and that are assigned to g vector

Solve the rest part of the least squares method and assign the value to the distribution functions for the next time step : f[sub]alpha[/sub] = A[sub]alpha[/sub]*g, where alpha = 1,2,…,9 for D2Q9 model
[/ol]

If your method does not converge or the matrices S[sub]alpha[/sub] appears to be singular try to include more neighbor nodes. An algorithm for the TLLBM can be found here on page 15. Finally you must take into account that TLLBM executes the collision and streaming steps in a single step.