Distributed Training Machine Learning

deeplearnyogi
deeplearnyogi Registered Posts: 9 ✭✭✭✭
edited July 16 in Using Dataiku

Hi,

What I enjoy about Dataiku is the visual machine learning.

I have a 21 GB Dataset to train and I'd like to try it on Dataiku with XGBOOST however it will take a while.

I have a couple machines that connect in a SSH cluster.

Is there anyway I can create a Dask SSH cluster in Dataiku so I can use the visual machine learning to train the data?

In my jupyter notebook, I create the SSH dask cluster as follows:

from dask.distributed import Client, SSHCluster
cluster = SSHCluster(
        ["localhost", "192.168.1.119", "192.168.1.191"],
        connect_options={"known_hosts": None,"username": "vinhdiesal"},
        worker_options={"nthreads": 20, "local_directory":"/tmp/"},
        scheduler_options={"port": 0, "dashboard_address": ":8797"},
        worker_module= 'dask_cuda.dask_cuda_worker'
)
client = Client(cluster)
client

Thanks,

Vinh

Answers

  • Andrey
    Andrey Dataiker Alumni Posts: 119 ✭✭✭✭✭✭✭

    Hi,

    Thanks for the positive feedback about visual ML in DSS. However, I have to admit, that Dask isn't integrated into it in any way. The only way you could proceed in DSS is by using Notebooks and implementing the Dask interaction yourself.

Setup Info
    Tags
      Help me…