Error Running HDFS Command in Python Recipe

Options
jkonieczny
jkonieczny Registered Posts: 13 ✭✭✭✭

I have some code where I need to run an HDFS command in Python to check if a file is present. See below for an example:

import subproces

command = 'hdfs dfs -ls /sandbox'

ssh = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE).communicate()

print(ssh)

When I run this in a Jupyter notebook in Dataiku, the command completes without any problems. However, when I run the notebook as a Python recipe, I get the following error message multiple times:


java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)];

It looks as if there is a problem with Kerberos when I run the Jupyter notebook as a Recipe. What is the reason for this? Is there a Dataiku setting I can change to make sure the Kerberos ticket is generated properly?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Options
    Hi,

    Your company is running Dataiku in Multi-User-Security mode. In this mode, Dataiku performs complex interaction with Kerberos in order to ensure that each activity runs as the end-user while Dataiku only has a single credential.

    This interaction makes it so that the Python recipe does not have impersonated credentials. That being said, it is possible that a Pyspark recipe (rather than a vanilla Python one) would work (you don't need to actually do anything Spark-y in a Pyspark recipe)
  • tomas
    tomas Registered, Neuron 2022 Posts: 120 ✭✭✭✭✭
    Options
    I can confirm, using PySpark in these cases are the solution. The impersonation is handled by Dataiku in this case, so you dont have to worry about keytabs, and do a kinit before the command (or to cron the kinit for the specific user)
Setup Info
    Tags
      Help me…