Api to get spark config of a recipe
Hi Team,
I'm trying to fetch the spark configuration a recipe using dataiku python api but I can only extract the config of spark native engine.I have tried using recipe.status.get_selected_engine_details() but it does not tell me the configuration(default, yarn-large, yarn-extra-large) of spark engine.
Sample code snippet is below.
client = dataiku.api_client()
project = client.get_project(project_name)
recipes = project.list_recipes()
for recipe_name in recipes:
spark_config = recipe_name['params']['engineParams']['spark']['sparkConfig']['inheritConf']
Any leads is appreciated. Thankyou!
Answers
-
Hi nmahdu20,
In DSS, each recipe running on top of Spark points to a Spark configuration that is accessible through the instance settings. If you want to retrieve the name and details of this configuration, here is a simple way of doing so via the API:
client = dataiku.api_client() project = client.get_project(YOUR_PROJECT_KEY) # Get the name of the Spark configuration used by your recipe rcp_spark_conf = project.get_recipe(YOUR_RECIPE_ID) \ .get_settings() \ .raw_params \ .get("sparkConfig") \ .get("inheritConf") print("The Spark configuration for recipe {} is called '{}'".format(YOUR_RECIPE_ID, rcp_spark_conf)) # Retrieve all existing Spark settings at the instance level instance_spark_confs = client.get_general_settings() \ .get_raw() \ .get("sparkSettings") \ .get("executionConfigs") # Look up the config used by your recipe target_spark_conf = next(filter(lambda x: x["name"] == rcp_spark_conf, instance_spark_confs)) # Print the key-value pairs of your Spark execution configuration target_spark_exec_conf = {x["key"]: x["value"] for x in target_spark_conf["conf"]} print(target_spark_exec_conf)
Hope this helps.
Best,
Harizo
-
Note that you can even get better results visually, by selecting the "Spark configurations" view at the bottom left of your Flow screen. DSS will colorize the Spark-based recipes according to the configuration they are using, and you can easily look up the execution settings in the Administration > Settings > Spark section of your instance.
-
Hi HarizoR,
Thankyou for your reply.
I tried the first code sample but got two issues:
- Not all recipes have sparkConfig key in their json dict so the get() line throws error.
- I don't have admin previledges to execute line instance_spark_confs = client.get_general_settings()
Additionally, the visual solution would be easier but we are building a code where we need to store the spark configs of all recipes in a final dataset. But we are able to extract sparkconfig details of only spark_native engine( spark optimized).