Writing to Files with Palabos

gdhir · June 14, 2023, 1:24am

Hi,

I hope you are doing well. I am currently working on performing Proper Orthogonal Decomposition from a dataset containing time snapshots of the entire domain using Palabos. My dataset contains velocity and vorticity fields of size 1601*1601 in the spatial domain.

Each individual snapshot is stored within a separate text file via Palabos. Each datafile contains 1601*1601 floating point elements.

I am currently having very large load times for each datafile in Python for performing POD analysis. I observed that when using

plb_ofstream successiveProfiles;

successiveProfiles <<
// (2) Convert from lattice to physical units.
*multiply(
parameters.getDeltaX() / parameters.getDeltaT(),
*computeVelocityComponent( lattice, 0 ))
<< endl;

for data storage, the data is stored in a single row in the text file. The data is not stored in a column oriented format.

On using Pandas in Python, I find that Pandas has much lower load time when the dataset is specified as a column and it completely outperforms np.loadtxt() for this case.

For example, I tried to use pd.read_csv() and np.loadtxt() for 1601*1601 dataset stored in a single column and load times were 2.89 seconds and 57.54 seconds respectively. This shows around 25X speedup for the case when data is stored in a column oriented format.

On the other hand, if data is specified as a single row, pd.read_csv() still performs better but the load times are still excessive. I wanted to know if it is possible to write the entire velocity/vorticity field from Palabos in a single column instead of a single row from C++.

Further, if there is also a possibility to store data in Parquet files, further speedups are obtained and load times in Python reduce to around 0.89 seconds. I wanted to know if there is a way to write to Parquet or HDF file formats which offer high speedup.

Thanks for your time and I look forward to your reply.