to run a python recipe targeting the containerized configuration and verify it is on GKE cluster
Dear Dataiku team,
I am new to Dataiku and currently trying to complete an assignment associated with DSS using GKE Cluster.
I have created a sample python recipe, which is accessing its dataset via GCS (where a .csv file is saved).
Python recipe is copying the input dataset to output and nothing else.
And also, I have created a GKE Cluster using the plugin at DSS. I can see that, GKE cluster is attached and running on my DSS instance. I can verify the GKE cluster using kubectl command list the nodes.
However, I am unable to figure out how do I run a python recipe targeting the containzerized configuration and make sure through logs it is running on the cluster.
I have scanned through the DSS documentation multiple times, but I am still clueless.
Please see, I can successfully deploy a container image using the below command and I can list the pods running the cluster using DSS instance.
kubectl create deployment hello-server \
--image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
[vinayak_satapute@dss-instance-centos ~]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-server-5bd6b6875f-xtqxx 1/1 Running 0 7h7m
[vinayak_satapute@dss-instance-centos ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-new-gke-cluster-node-pool-0-d41f81f9-0d5x Ready <none> 8d v1.22.8-gke.200
gke-new-gke-cluster-node-pool-0-d41f81f9-9s1c Ready <none> 8d v1.22.8-gke.200
Appreciate some kind inputs/guidance.
Thanks.
Operating system used: centos7
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi @vinayaksatapute
,The full list of steps is available here: https://doc.dataiku.com/dss/latest/containers/setup-k8s.html
From what you are describing you may still need to do the following steps:
1) First of making sure to build the base images
https://doc.dataiku.com/dss/latest/containers/custom-base-images.html#customization-of-base-images
2) You have at least 1 valid containerized execution config defined under Administration - Settings - Containerized Execution - https://doc.dataiku.com/dss/latest/containers/setup-k8s.html#setting-up-containerized-execution-configs
3) Make sure to Push the image/s
Then when you run a recipe ( which can be containerized e.g Python) if you didn't select containerized config the Advanced tab. You also set containerizeed as the default on the project/instance level, you can select this for the individual recipe :
Let me know if that helps!
-
Dear Alex,
Thanks for your inputs.
I could configure for the containerized execution and push the base image successfully.
However, when I run the python recipe individually,it successfully builds and completes as a Kubernetes job as seen under the logs. But, I don't see any pod service running or completed as verified from the DSS instance using kubectl client. So if I need to run this recipe as a container, I need to create a docker image for the same and push? Isn't it suppose to run as a docker container directly from the base image created earlier? or did I miss something ?
I see a particular INFO as well from the logs generated as highlighted here RecipeRunLog.jpeg file.
and complete log details under the RecipeRun.log
Recipe build logs are shared here as a screen shot for your view.
Thank you.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
The Python recipe ran in your K8s cluster. It created a pod and job which were cleaned up once the job completed.
Successful jobs and pods are cleaned up after completion.
If you run kubectl command whole job is running you will see the running pods :
kubectl get pods --all-namespaces
Docker is only used to build the base image/ code env specific image which are pushed to your Image registry.
-
Thanks Alex.
Yes, I listed the pods via
kubectl get pods -A
and could see the python job in that 4sec windows.
Appreciate your inputs.