General Guidance on Scenario Steps and Python

indy2005
indy2005 Registered Posts: 21 ✭✭✭✭
edited July 2024 in Using Dataiku

Hi,

The documentation seems quite sparse on this, although if there is a good one stop document for this I would appreciate it.

I am trying to understand the right way to:

  • programatically set the outcome of the step using Python to sucess or failure, without having to rely on variable setting - how, via Python, do we set the step outcome?
  • programtically GET the outcome of the previous step via pythong (success or failure)
  • What is the correct way to access autogenerated variables from previous steps, such as the outcome in Python? For example if I have a SQL step and I need to accesss the output in Python how do I do it. I have the code below, which seems very verbose and so feel I am over engineering it
  • How to access an outcome or output from a previous step using the expression language (for example to access the output of a SQL step) - do I have to use parseJson?
  • How to set the output of a python step (not the outcome, but perhaps some data to pass on without using variables)?
  • When I use set_scenario_variables(), there is no get_scenario_variables() - so I just use get_custom_variables. Are the variables scoped to the scenario?

Any help on this would be appreciated, even if links to relevant docs.

At the moment the code I have for accessing the output of the SQL step is below.

from dataiku.scenario import Scenario
import dataiku
s = Scenario()

print('Python Step Ran...ETL date must be today..')
outputs = s.get_previous_steps_outputs()
sql_output = [o['result'] for o in outputs if o["stepName"] == 'CheckSapphireETLDate'][0]
s.set_scenario_variables(sapphire_etl_date = sql_output['rows'][0][0])
print(f'sapphire_etl_date = ' + dataiku.get_custom_variables()["sapphire_etl_date"])

Tagged:

Answers

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 323 Neuron

    Hi @indy2005
    ,

    I thought I'd share my experiences as they related to your questions. Not complete answers but maybe still helpful.

    I believe I explored programmatically setting the outcome of the step using Python but wasn't able to figure out a solution. Well, one can simply raise a Python exception to indicate a failure (and that's what I do when I want to fail the step) but that's a bit different than setting the result variable directly.

    I ended up using a different variable that I created and set to defined value in the first step in the scenario. Not ideal but it worked to accomplish what I was trying to do which was track multiple outcomes and then generate a different email for each outcome.

    The scenario get_all_variables() method is used to get both project and scenario variables within a Python step.

    To get the output of a SQL script in Python, I would just execute it within the Python step rather than as a separate step. See documentation here: https://doc.dataiku.com/dss/latest/python-api/sql.html

    Here's an example of getting SQL step output using the formula language: parseJson(stepOutput_TimelinessCheck)['rows'][0][0]. In this example, "TimelinessCheck" is the label given to the step.

    I don't know how you would set the output of a python step besides via a variable. It maybe possible, I just don't know.

    Hope this helps.

    Marlan

  • indy2005
    indy2005 Registered Posts: 21 ✭✭✭✭

    Thanks. Very helpful. I think I have the pattern down now for setting variables and using them in expressions as conditions in future steps. I assume though that it must be possible to set the outcome of the step via Python to success or failure, rather than creating your own variables and using them in expressions.

Setup Info
    Tags
      Help me…