Support for Dask distributed jobs?

rmnvncnt
rmnvncnt Registered Posts: 41 ✭✭✭✭✭

Hello,

I'm currently evaluating various engines available in DSS and I was wondering if Dask was something Dataiku was currently working on?

We tried to use PySpark in the past, but it might be overkill for our use case (we have thousands of small partitions) and we never really managed to get it running anyway. Dask seems a bit more suitable for small to medium sized jobs, without the Hadoop overhead.

Any thoughts about it?

Best,

Romain

Best Answer

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Answer ✓

    Hi,

    We've studied in the past leveraging Dask for our Visual ML but we encountered various stability issues which forced us not to consider it for that specific usage.

    We are not currently considering adding Dask as an execution engine for visual recipes in Dataiku.

    However, you should be able to leverage Dask as you wish in Python recipes and notebooks. Note that you'll still need to provide the cluster (Kubernetes for example) that Dask will leverage.

Answers

  • rmnvncnt
    rmnvncnt Registered Posts: 41 ✭✭✭✭✭

    I understand it might be tricky to include Dask as an engine within DSS's backend, but could it be possible to allow reading into Dask data structures using the API nonetheless? For instance, having dataiku.Dataset.get_dask_dataframe() (returning a handle to a dask dataframe object) besides the traditional dataiku.get_dataframe() which returns a pandas dataframe.

  • Sampathvinta
    Sampathvinta Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Neuron 2023 Posts: 2 Neuron

    restarting a very old thread. But I have the same question around Dask if there are any thoughts around having a dask dataframe handle instead of pandas dataframe handle.

    or if I get a pandas dataframe then convert into Dask, would that give me same experience as getting a dask dataframe from Dataiku dataset ?

Setup Info
    Tags
      Help me…