Error Running HDFS Command in Python Recipe

jkonieczny
Level 2
Error Running HDFS Command in Python Recipe

I have some code where I need to run an HDFS command in Python to check if a file is present.  See below for an example:



 



import subproces



command = 'hdfs dfs -ls /sandbox'

ssh = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).communicate()

print(ssh)



 



When I run this in a Jupyter notebook in Dataiku, the command completes without any problems.  However, when I run the notebook as a Python recipe, I get the following error message multiple times:




java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)];


It looks as if there is a problem with Kerberos when I run the Jupyter notebook as a Recipe.  What is the reason for this?  Is there a Dataiku setting I can change to make sure the Kerberos ticket is generated properly?

0 Kudos
2 Replies
Clément_Stenac
Dataiker
Hi,

Your company is running Dataiku in Multi-User-Security mode. In this mode, Dataiku performs complex interaction with Kerberos in order to ensure that each activity runs as the end-user while Dataiku only has a single credential.

This interaction makes it so that the Python recipe does not have impersonated credentials. That being said, it is possible that a Pyspark recipe (rather than a vanilla Python one) would work (you don't need to actually do anything Spark-y in a Pyspark recipe)
0 Kudos
tomas
Level 5
I can confirm, using PySpark in these cases are the solution. The impersonation is handled by Dataiku in this case, so you dont have to worry about keytabs, and do a kinit before the command (or to cron the kinit for the specific user)
0 Kudos

Labels

?
Labels (3)
A banner prompting to get Dataiku