Re: Extremely slow performance with shapefile and geopackage


Sean Gillies
 

René, this is all good advice.

In the past, saving to a geopackage could be especially slow because of the overhead of transactions. Every Collection write() would happen within its own transaction. Since version 1.8 of Fiona, calling writerecords() uses a default transaction size of 20,000 features (see https://fiona.readthedocs.io/en/latest/README.html#a1-2017-11-06) and is much faster.

Geopandas uses the faster method since https://github.com/geopandas/geopandas/issues/557#issuecomment-332202764. You may want to check to see if a geopandas upgrade improves your situation.

On Thu, Nov 14, 2019 at 8:33 AM René Buffat <buffat@...> wrote:
Hi David

First I would check if you have a sufficient amount of RAM available. If not, this could explain the slow performance.
If this is the case, I would recommend to read, process and write the data in batches.

Otherwise, there are a lot of parameters that can impact the performance. E.g. how complex the geometries are, how many rows you want to write, how many parallel reads and write you have to the disk, etc.

Regarding geometries problems, I'm not entirely sure what you mean. But regardless, with big datasets, it's always a good option to debug with smaller datasets (e.g. the first thousand lines) and then test if everything works. 

And fully unrelated, I would recommend to use os.path.join(datadir, "data.shp") instead of data_dir+"data.gpkg"

lg rene



--
Sean Gillies

Join main@fiona.groups.io to automatically receive all group messages.