Benchmarking a dask-parquet-s3 workflow

Foreword Benchmarking is hard and usually biased by the author’s experience and viewpoint, which strongly affects what they choose to benchmark and how many “tricks” they know to optimise performance. This article is an extension of benchmarking on the Coiled blog, which showed that using PyArrow string rather than python string objects is beneficial for […]