Api to get spark config of a recipe

nmadhu20
nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron

Hi Team,

I'm trying to fetch the spark configuration a recipe using dataiku python api but I can only extract the config of spark native engine.I have tried using recipe.status.get_selected_engine_details() but it does not tell me the configuration(default, yarn-large, yarn-extra-large) of spark engine.

Sample code snippet is below.

client = dataiku.api_client()

project = client.get_project(project_name)
recipes = project.list_recipes()

for recipe_name in recipes:

spark_config = recipe_name['params']['engineParams']['spark']['sparkConfig']['inheritConf']

Any leads is appreciated. Thankyou!

Answers

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    edited July 17

    Hi nmahdu20,

    In DSS, each recipe running on top of Spark points to a Spark configuration that is accessible through the instance settings. If you want to retrieve the name and details of this configuration, here is a simple way of doing so via the API:

    client = dataiku.api_client()
    project = client.get_project(YOUR_PROJECT_KEY)
    
    # Get the name of the Spark configuration used by your recipe
    rcp_spark_conf = project.get_recipe(YOUR_RECIPE_ID) \
        .get_settings() \
        .raw_params \
        .get("sparkConfig") \
        .get("inheritConf")
    print("The Spark configuration for recipe {} is called '{}'".format(YOUR_RECIPE_ID, rcp_spark_conf))
    
    # Retrieve all existing Spark settings at the instance level
    instance_spark_confs = client.get_general_settings() \
        .get_raw() \
        .get("sparkSettings") \
        .get("executionConfigs")
        
    # Look up the config used by your recipe
    target_spark_conf = next(filter(lambda x: x["name"] == rcp_spark_conf,  instance_spark_confs))
    
    # Print the key-value pairs of your Spark execution configuration
    target_spark_exec_conf = {x["key"]: x["value"] for x in target_spark_conf["conf"]}
    print(target_spark_exec_conf)

    Hope this helps.

    Best,

    Harizo

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker

    Note that you can even get better results visually, by selecting the "Spark configurations" view at the bottom left of your Flow screen. DSS will colorize the Spark-based recipes according to the configuration they are using, and you can easily look up the execution settings in the Administration > Settings > Spark section of your instance.

    Screenshot 2021-09-06 at 17.29.08.png

  • nmadhu20
    nmadhu20 Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 35 Neuron

    Hi HarizoR,

    Thankyou for your reply.

    I tried the first code sample but got two issues:

    1. Not all recipes have sparkConfig key in their json dict so the get() line throws error.
    2. I don't have admin previledges to execute line instance_spark_confs = client.get_general_settings()

    Additionally, the visual solution would be easier but we are building a code where we need to store the spark configs of all recipes in a final dataset. But we are able to extract sparkconfig details of only spark_native engine( spark optimized).

Setup Info
    Tags
      Help me…