You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Alternatives to Spark for plain Python

NikolayK
Level 2
Level 2
Alternatives to Spark for plain Python

Hello,

For our data-intensive recipes, we use PySpark to distribute calculations on a kubernetes cluster. However, there are compute-intensive models (e.g. simulation-based) that we would also like to distribute on multiple machines and my question is whether for them Spark is still the best way to do it in DSS. Our criteria are:

1. Startup time to begin simulations on the cluster

2. Costs of using the cluster. I'm mainly referring to the container image size, but maybe there are other aspects here, too.

3. Usability/configuration/maintenance. For Spark it's very simple to use it from a recipe, both in the code and from the UI, and we'd really like it to be the case for any other technology.

4. Anything else important that I'm missing?

Thanks in advance!

0 Kudos
0 Replies

Labels

?
Labels (2)
A banner prompting to get Dataiku