Do you know the History of Data Science? READ MORE

Api to get spark config of a recipe

nmadhu20
Level 1
Api to get spark config of a recipe

Hi Team,

I'm trying to fetch the spark configuration a recipe using dataiku python api but I can only extract the config of spark native engine.I have tried using recipe.status.get_selected_engine_details() but it does not tell me the configuration(default, yarn-large, yarn-extra-large) of spark engine.

Sample code snippet is below.

client = dataiku.api_client()

project = client.get_project(project_name)
recipes = project.list_recipes()

for recipe_name in recipes:

spark_config = recipe_name['params']['engineParams']['spark']['sparkConfig']['inheritConf']

 

Any leads is appreciated. Thankyou!

0 Kudos
3 Replies
HarizoR
Dataiker
Dataiker

Hi nmahdu20,

In DSS, each recipe running on top of Spark points to a Spark configuration that is accessible through the instance settings. If you want to retrieve the name and details of this configuration, here is a simple way of doing so via the API:

client = dataiku.api_client()
project = client.get_project(YOUR_PROJECT_KEY)

# Get the name of the Spark configuration used by your recipe
rcp_spark_conf = project.get_recipe(YOUR_RECIPE_ID) \
    .get_settings() \
    .raw_params \
    .get("sparkConfig") \
    .get("inheritConf")
print("The Spark configuration for recipe {} is called '{}'".format(YOUR_RECIPE_ID, rcp_spark_conf))

# Retrieve all existing Spark settings at the instance level
instance_spark_confs = client.get_general_settings() \
    .get_raw() \
    .get("sparkSettings") \
    .get("executionConfigs")
    
# Look up the config used by your recipe
target_spark_conf = next(filter(lambda x: x["name"] == rcp_spark_conf,  instance_spark_confs))

# Print the key-value pairs of your Spark execution configuration
target_spark_exec_conf = {x["key"]: x["value"] for x in target_spark_conf["conf"]}
print(target_spark_exec_conf)

 

Hope this helps.

Best,

Harizo

0 Kudos
HarizoR
Dataiker
Dataiker

Note that you can even get better results visually, by selecting the "Spark configurations" view at the bottom left of your Flow screen. DSS will colorize the Spark-based recipes according to the configuration they are using, and you can easily look up the execution settings in the Administration > Settings > Spark section of your instance. 

 

Screenshot 2021-09-06 at 17.29.08.png

0 Kudos
nmadhu20
Level 1
Author

Hi HarizoR,

Thankyou for your reply.

I tried the first code sample but got two issues: 

  1. Not all recipes have sparkConfig key in their json dict so the get() line throws error.
  2. I don't have admin previledges to execute line instance_spark_confs = client.get_general_settings()

Additionally, the visual solution would be easier but we are building a code where we need to store the spark configs of all recipes in a final dataset. But we are able to extract sparkconfig details of only spark_native engine( spark optimized).

0 Kudos
A banner prompting to get Dataiku DSS