I hope you are doing well. I am currently working on performing Proper Orthogonal Decomposition from a dataset containing time snapshots of the entire domain using Palabos. My dataset contains velocity and vorticity fields of size 1601*1601 in the spatial domain.
Each individual snapshot is stored within a separate text file via Palabos. Each datafile contains 1601*1601 floating point elements.
I am currently having very large load times for each datafile in Python for performing POD analysis. I observed that when using
// (2) Convert from lattice to physical units.
parameters.getDeltaX() / parameters.getDeltaT(),
*computeVelocityComponent( lattice, 0 ))
for data storage, the data is stored in a single row in the text file. The data is not stored in a column oriented format.
On using Pandas in Python, I find that Pandas has much lower load time when the dataset is specified as a column and it completely outperforms np.loadtxt() for this case.
For example, I tried to use pd.read_csv() and np.loadtxt() for 1601*1601 dataset stored in a single column and load times were 2.89 seconds and 57.54 seconds respectively. This shows around 25X speedup for the case when data is stored in a column oriented format.
On the other hand, if data is specified as a single row, pd.read_csv() still performs better but the load times are still excessive. I wanted to know if it is possible to write the entire velocity/vorticity field from Palabos in a single column instead of a single row from C++.
Further, if there is also a possibility to store data in Parquet files, further speedups are obtained and load times in Python reduce to around 0.89 seconds. I wanted to know if there is a way to write to Parquet or HDF file formats which offer high speedup.
Thanks for your time and I look forward to your reply.