Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

to run a python recipe targeting the containerized configuration and verify it is on GKE cluster

vinayaksatapute
Level 1
to run a python recipe targeting the containerized configuration and verify it is on GKE cluster

Dear Dataiku team,

I am new to Dataiku and currently trying to complete an assignment associated with DSS using GKE Cluster.

I have created a sample python recipe, which is accessing its dataset via GCS (where a .csv file is saved).

Python recipe is copying the input dataset to output and nothing else.

And also, I have created a GKE Cluster using the plugin at DSS. I can see that, GKE cluster is attached and running on my DSS instance. I can verify the GKE cluster using kubectl command list the nodes.

However, I am unable to figure out how do I run a python recipe targeting the containzerized configuration and make sure through logs it is running on the cluster.

I have scanned through the DSS documentation multiple times, but I am still clueless.

Please see, I can successfully deploy a container image using the below command and I can list the pods running the cluster using DSS instance.

kubectl create deployment hello-server \
   
--image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0

 

 [vinayak_satapute@dss-instance-centos ~]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-server-5bd6b6875f-xtqxx 1/1 Running 0 7h7m

 

[vinayak_satapute@dss-instance-centos ~]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-new-gke-cluster-node-pool-0-d41f81f9-0d5x Ready <none> 8d v1.22.8-gke.200
gke-new-gke-cluster-node-pool-0-d41f81f9-9s1c Ready <none> 8d v1.22.8-gke.200

 

Appreciate some kind inputs/guidance.

Thanks.


Operating system used: centos7

0 Kudos
4 Replies
AlexT
Dataiker
Dataiker

Hi @vinayaksatapute ,

The full list of steps is available here: https://doc.dataiku.com/dss/latest/containers/setup-k8s.html

From what you are describing you may still need to do the following steps:

1) First of making sure to build the base images 

https://doc.dataiku.com/dss/latest/containers/custom-base-images.html#customization-of-base-images

https://doc.dataiku.com/dss/latest/containers/code-envs.html#using-code-envs-with-containerized-exec... 

2) You have at least 1 valid containerized execution config defined under Administration - Settings - Containerized Execution - https://doc.dataiku.com/dss/latest/containers/setup-k8s.html#setting-up-containerized-execution-conf...

3) Make sure to Push the image/s

Then when you run a recipe ( which can be containerized e.g Python) if you didn't select containerized config the Advanced tab. You also set containerizeed  as the default on the project/instance level, you can select this for the individual recipe :

Screenshot 2022-04-15 at 12.16.37.png

Let me know if that helps!

0 Kudos
vinayaksatapute
Level 1
Author

Dear Alex,

Thanks for your inputs.

I could configure for the containerized execution and push the base image successfully.

However, when I run the python recipe individually,it successfully builds and completes as a Kubernetes job as seen under the logs. But, I don't see any pod service running or completed as verified from the DSS instance using kubectl client. So if I need to run this recipe as a container, I need to create a docker image for the same and push? Isn't it suppose to run as a docker container directly from the base image created earlier? or did I miss something ?

I see a particular INFO as well from the logs generated as highlighted here RecipeRunLog.jpeg file.

and complete log details under the RecipeRun.log

Recipe build logs are shared here as a screen shot for your view.

Thank you.

 
 

 

 

0 Kudos
AlexT
Dataiker
Dataiker

The Python recipe ran in your K8s cluster. It created a pod and job which were cleaned up once the job completed.

Successful jobs and pods are cleaned up after completion.

If you run kubectl command whole job is running you will see the running pods :

kubectl get pods --all-namespaces

Docker is only used to build the base image/ code env specific image which are pushed to your Image registry. 

0 Kudos
vinayaksatapute
Level 1
Author

Thanks Alex.

Yes, I listed the pods via

kubectl get pods -A

and could see the python job in that 4sec windows. 

Appreciate your inputs. 

 

0 Kudos