Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello Dataiku Community,
Hope all is well!
Our team is looking to implement new Spark and container configuration settings on our instances. We are curious to understand what the best practices are for updating the existing configurations. For context we have existing Spark configurations already being used by end users, however we would like to replace these existing settings with net new settings and naming conventions.
As a test we tried creating a net new Spark configuration on one of our dedicated "Dev" instances (a design node) and tested what would happen if we renamed the configuration. We saw that if we rename the configuration, all previous settings that were explicitly set to this new setting will be converted to "Nothing Selected". Please see example files for before and after pictures. Our before config was named "Large_9GBMem_11Exec_new" and we updated the name of the config to "Large_9GBMem_11Exec". However, in the after, we saw that the selection now is set to "Nothing Selected". Is there a way to have the selection default to the new name "Large_9GBMem_11Exec" for example or is this behavior expected?
I found the following below documentation and discussions through the community regarding systematically checking/updating the Spark settings via python script and wanted to confirm if it is the best practice to leverage the API to systematically update the Spark configs or if there is another way to update the configs/config names through the UI automatically.
Appreciate the feedback!
Hi @kathyqingyuxu we are in the same scenario as you. We've a few new spark configs and using the Python API to get and set SparkSQL and PySpark recipe steps is simple, akin to something like:
for r in recipes: recipe = proj.get_recipe(r['name']) sets = recipe.get_settings() if sets.type == 'pyspark': current = sets.recipe_settings['params']['sparkConfig']['inheritConf'] print(recipe.name, current) if current == 'design-spark-rubix-small': sets.recipe_settings['params']['sparkConfig']['inheritConf'] = 'design-spark-small' elif current == 'design-spark': sets.recipe_settings['params']['sparkConfig']['inheritConf'] = 'design-spark-medium' sets.save()
However, when we run into visual recipes (joins, prepare, etc) i've noticed that many recipes don't have spark config metadata in their get_recipe_settings() or other areas, but are indeed configured at the GUI level.
and after an answer from Dataiku support (seriously the best support team in the world) - here's steps for viz recipes:
client = dataiku.api_client()
proj = client.get_project("MY_PROJECT")
recipe = proj.get_recipe("MY_RECIPE")
sets = recipe.get_settings()
payload = sets.get_json_payload()
for spark_conf in payload["engineParams"]["sparkSQL"]["sparkConfig"]["conf"]:
if spark_conf["key"] == "spark.example.foo":
print("Updating Spark config:", spark_conf)
spark_conf["value"] = "bar"
Thanks for the information @importthepandas !
I ended up modifying slightly on my end and ended up leveraging the following to get what I needed:
client = dataiku.api_client()
dss_projects = client.list_projects()
for project in dss_projects:
project_obj = client.get_project(project['projectKey'])
recipes = project_obj.list_recipes()
for item in recipes:
recipe = project_obj.get_recipe(item['name'])
settings = recipe.get_settings()
status = recipe.get_status()
if status.get_selected_engine_details()["type"] == "SPARK":
spark_settings = settings.get_recipe_params()
From there I was able to find the settings within spark_settings["engineParams"]["spark"]["readParam"]["sparkConfig"]["inheritConf"]
Hope this helps others 🙂