You now have until September 15th to submit your use case or success story to the 2022 Dataiku Frontrunner Awards!ENTER YOUR SUBMISSION

Error Running HDFS Command in Python Recipe

jkonieczny
Level 2
Error Running HDFS Command in Python Recipe

I have some code where I need to run an HDFS command in Python to check if a file is present.  See below for an example:



 



import subproces



command = 'hdfs dfs -ls /sandbox'

ssh = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).communicate()

print(ssh)



 



When I run this in a Jupyter notebook in Dataiku, the command completes without any problems.  However, when I run the notebook as a Python recipe, I get the following error message multiple times:




java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)];


It looks as if there is a problem with Kerberos when I run the Jupyter notebook as a Recipe.  What is the reason for this?  Is there a Dataiku setting I can change to make sure the Kerberos ticket is generated properly?

0 Kudos
2 Replies
Clément_Stenac
Dataiker
Dataiker
Hi,

Your company is running Dataiku in Multi-User-Security mode. In this mode, Dataiku performs complex interaction with Kerberos in order to ensure that each activity runs as the end-user while Dataiku only has a single credential.

This interaction makes it so that the Python recipe does not have impersonated credentials. That being said, it is possible that a Pyspark recipe (rather than a vanilla Python one) would work (you don't need to actually do anything Spark-y in a Pyspark recipe)
0 Kudos
tomas
Neuron
Neuron
I can confirm, using PySpark in these cases are the solution. The impersonation is handled by Dataiku in this case, so you dont have to worry about keytabs, and do a kinit before the command (or to cron the kinit for the specific user)
0 Kudos

Labels

?
Labels (3)
A banner prompting to get Dataiku