Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Alternatives to Spark for plain Python

Level 3
Alternatives to Spark for plain Python


For our data-intensive recipes, we use PySpark to distribute calculations on a kubernetes cluster. However, there are compute-intensive models (e.g. simulation-based) that we would also like to distribute on multiple machines and my question is whether for them Spark is still the best way to do it in DSS. Our criteria are:

1. Startup time to begin simulations on the cluster

2. Costs of using the cluster. I'm mainly referring to the container image size, but maybe there are other aspects here, too.

3. Usability/configuration/maintenance. For Spark it's very simple to use it from a recipe, both in the code and from the UI, and we'd really like it to be the case for any other technology.

4. Anything else important that I'm missing?

Thanks in advance!

0 Kudos
2 Replies
Developer Advocate


There are a few Python-based frameworks to distribute computation like Dask or Ray, but in the realm of data science, Spark remains the industry standard, which is why it has this first-class-citizen integration in the Dataiku platform. All the Spark-related features were designed specifically for Spark itself, not in the mindset of plugging arbitrary distributed computing frameworks on Dataiku. In practice, it's not impossible to make other frameworks work, however it will require a substantial amount of additional work.

Do you have specific tools and/or use-cases  in mind that you may want to share ?





0 Kudos
Level 3

Hi Harizo,

Thanks for your reply. In the meantime, we managed to get Spark to work for our needs. It required some trickery, as a typical Spark problem performs relatively simple computations on massive data and our case is the opposite: a little data and lengthy computations, therefore Spark wanted to execute everything on a single node. However, forcing some settings made the trick. We are good now.



Labels (2)
A banner prompting to get Dataiku