You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Alternatives to Spark for plain Python

Level 2
Level 2
Alternatives to Spark for plain Python


For our data-intensive recipes, we use PySpark to distribute calculations on a kubernetes cluster. However, there are compute-intensive models (e.g. simulation-based) that we would also like to distribute on multiple machines and my question is whether for them Spark is still the best way to do it in DSS. Our criteria are:

1. Startup time to begin simulations on the cluster

2. Costs of using the cluster. I'm mainly referring to the container image size, but maybe there are other aspects here, too.

3. Usability/configuration/maintenance. For Spark it's very simple to use it from a recipe, both in the code and from the UI, and we'd really like it to be the case for any other technology.

4. Anything else important that I'm missing?

Thanks in advance!

0 Kudos
0 Replies


Labels (2)
A banner prompting to get Dataiku