Python recipe parallelism & Distributed
Hi,
How to achieve distributed & parallelism for python recipe in Dataiku DSS ?
Best Answer
-
Hello,
Note that if you use Dask you'll still need to provide the cluster (Kubernetes for example) that Dask will leverage. Our support covers DSS but neither the distribution and parallelization code with the Dask API nor maintenance of clusters.
Most of our users use Spark for parallelization and distribution. If you enable Spark with DSS, you will be able to use Pyspark recipes with dedicated DSS integration https://doc.dataiku.com/dss/latest/code_recipes/pyspark.html
Best,
Arnaud
Answers
-
Hello,
You can parallelize and distribute your python code execution with any python library meant for that (list of parallel processing python libraries). You will have to install the library in a Code environment and run the recipe with this code environment.
If you have spark enabled you can also use the spark recipes which are distributing and parallelizing your execution over a cluster.
Best,
Arnaud -
Thank you.
Do we get dataiku team support, if we integrate dask with dataiku? OR Does Dataiku have any default framework integrated with the DSS? We are using DSS 7.0.2