Distributed Training Machine Learning
deeplearnyogi
Registered Posts: 9 ✭✭✭✭
Hi,
What I enjoy about Dataiku is the visual machine learning.
I have a 21 GB Dataset to train and I'd like to try it on Dataiku with XGBOOST however it will take a while.
I have a couple machines that connect in a SSH cluster.
Is there anyway I can create a Dask SSH cluster in Dataiku so I can use the visual machine learning to train the data?
In my jupyter notebook, I create the SSH dask cluster as follows:
from dask.distributed import Client, SSHCluster
cluster = SSHCluster(
["localhost", "192.168.1.119", "192.168.1.191"],
connect_options={"known_hosts": None,"username": "vinhdiesal"},
worker_options={"nthreads": 20, "local_directory":"/tmp/"},
scheduler_options={"port": 0, "dashboard_address": ":8797"},
worker_module= 'dask_cuda.dask_cuda_worker'
)
client = Client(cluster)
client
Thanks,
Vinh
Answers
-
Hi,
Thanks for the positive feedback about visual ML in DSS. However, I have to admit, that Dask isn't integrated into it in any way. The only way you could proceed in DSS is by using Notebooks and implementing the Dask interaction yourself.
