Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Error Running HDFS Command in Python Recipe

Level 2
Error Running HDFS Command in Python Recipe

I have some code where I need to run an HDFS command in Python to check if a file is present.  See below for an example:


import subproces

command = 'hdfs dfs -ls /sandbox'

ssh = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).communicate()



When I run this in a Jupyter notebook in Dataiku, the command completes without any problems.  However, when I run the notebook as a Python recipe, I get the following error message multiple times: Failed on local exception: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)];

It looks as if there is a problem with Kerberos when I run the Jupyter notebook as a Recipe.  What is the reason for this?  Is there a Dataiku setting I can change to make sure the Kerberos ticket is generated properly?

0 Kudos
2 Replies

Your company is running Dataiku in Multi-User-Security mode. In this mode, Dataiku performs complex interaction with Kerberos in order to ensure that each activity runs as the end-user while Dataiku only has a single credential.

This interaction makes it so that the Python recipe does not have impersonated credentials. That being said, it is possible that a Pyspark recipe (rather than a vanilla Python one) would work (you don't need to actually do anything Spark-y in a Pyspark recipe)
0 Kudos
Level 5
I can confirm, using PySpark in these cases are the solution. The impersonation is handled by Dataiku in this case, so you dont have to worry about keytabs, and do a kinit before the command (or to cron the kinit for the specific user)
0 Kudos