Api to get spark config of a recipe

Options
nmadhu20
nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron

Hi Team,

I'm trying to fetch the spark configuration a recipe using dataiku python api but I can only extract the config of spark native engine.I have tried using recipe.status.get_selected_engine_details() but it does not tell me the configuration(default, yarn-large, yarn-extra-large) of spark engine.

Sample code snippet is below.

client = dataiku.api_client()

project = client.get_project(project_name)
recipes = project.list_recipes()

for recipe_name in recipes:

spark_config = recipe_name['params']['engineParams']['spark']['sparkConfig']['inheritConf']

Any leads is appreciated. Thankyou!

Answers

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    edited July 17
    Options

    Hi nmahdu20,

    In DSS, each recipe running on top of Spark points to a Spark configuration that is accessible through the instance settings. If you want to retrieve the name and details of this configuration, here is a simple way of doing so via the API:

    client = dataiku.api_client()
    project = client.get_project(YOUR_PROJECT_KEY)
    
    # Get the name of the Spark configuration used by your recipe
    rcp_spark_conf = project.get_recipe(YOUR_RECIPE_ID) \
        .get_settings() \
        .raw_params \
        .get("sparkConfig") \
        .get("inheritConf")
    print("The Spark configuration for recipe {} is called '{}'".format(YOUR_RECIPE_ID, rcp_spark_conf))
    
    # Retrieve all existing Spark settings at the instance level
    instance_spark_confs = client.get_general_settings() \
        .get_raw() \
        .get("sparkSettings") \
        .get("executionConfigs")
        
    # Look up the config used by your recipe
    target_spark_conf = next(filter(lambda x: x["name"] == rcp_spark_conf,  instance_spark_confs))
    
    # Print the key-value pairs of your Spark execution configuration
    target_spark_exec_conf = {x["key"]: x["value"] for x in target_spark_conf["conf"]}
    print(target_spark_exec_conf)

    Hope this helps.

    Best,

    Harizo

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    Options

    Note that you can even get better results visually, by selecting the "Spark configurations" view at the bottom left of your Flow screen. DSS will colorize the Spark-based recipes according to the configuration they are using, and you can easily look up the execution settings in the Administration > Settings > Spark section of your instance.

    Screenshot 2021-09-06 at 17.29.08.png

  • nmadhu20
    nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron
    Options

    Hi HarizoR,

    Thankyou for your reply.

    I tried the first code sample but got two issues:

    1. Not all recipes have sparkConfig key in their json dict so the get() line throws error.
    2. I don't have admin previledges to execute line instance_spark_confs = client.get_general_settings()

    Additionally, the visual solution would be easier but we are building a code where we need to store the spark configs of all recipes in a final dataset. But we are able to extract sparkconfig details of only spark_native engine( spark optimized).

Setup Info
    Tags
      Help me…