Python recipe parallelism & Distributed

Solved!
NagarajuVarkala
Level 1
Python recipe parallelism & Distributed

Hi,

How to achieve distributed & parallelism for python recipe in Dataiku DSS ?

 

0 Kudos
1 Solution
arnaudde
Dataiker

Hello,

Note that if you use Dask you'll still need to provide the cluster (Kubernetes for example) that Dask will leverage. Our support covers DSS but neither the distribution and parallelization code with the Dask API nor maintenance of clusters.

Most of our users use Spark for parallelization and distribution. If you enable Spark with DSS, you will be able to use Pyspark recipes with dedicated DSS integration https://doc.dataiku.com/dss/latest/code_recipes/pyspark.html


Best,
Arnaud

View solution in original post

0 Kudos
3 Replies
arnaudde
Dataiker

Hello,
You can parallelize and distribute your python code execution with any python library meant for that (list of parallel processing python libraries). You will have to install the library in a Code environment and run the recipe with this code environment.
If you have spark enabled you can also use the spark recipes which are distributing and parallelizing your execution over a cluster.
Best,
Arnaud

0 Kudos
NagarajuVarkala
Level 1
Author

Thank you.

Do we get dataiku team support, if we integrate dask with dataiku? OR Does Dataiku have any default framework integrated with the DSS? We are using DSS 7.0.2

0 Kudos
arnaudde
Dataiker

Hello,

Note that if you use Dask you'll still need to provide the cluster (Kubernetes for example) that Dask will leverage. Our support covers DSS but neither the distribution and parallelization code with the Dask API nor maintenance of clusters.

Most of our users use Spark for parallelization and distribution. If you enable Spark with DSS, you will be able to use Pyspark recipes with dedicated DSS integration https://doc.dataiku.com/dss/latest/code_recipes/pyspark.html


Best,
Arnaud

0 Kudos