Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am in charge to migrate a whole Dataiku project to GCP BigQuery.
In my Dataiku project, I have around 90 input tables and 3 output tables, between those tables, around 500 different jobs (mostly JOIN, GROUP BY) and a single Python job.
Hello @Pizarro75,
Thank you so much for posting the question on Community.
For the recipes whose types are other than SQL (such as Join and Prepare) and engines are "In-database (SQL)", would you please check if you can retrieve their SQL queries as follows?
import dataiku
import pprint
client = dataiku.api_client()
project = client.get_default_project()
recipe = project.get_recipe('your_recipe_name')
status = recipe.get_status()
sql = status.data['sql']
pprint.pprint(sql)
I hope this would help.
Sincerely,
Keiji, Dataiku Technical Support
Thank you Keiji, this is working perfectly. I actually submitted a ticket in Dataiku suppot page and they were not able to find a way to help me. I just updated your script and I am getting the results I was expecting.
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipe = project.get_recipe('compute_MAST_STKO_STPO')
status = recipe.get_status()
sql = status.data['sql']
print(sql)
However, when I a trying to loop over all my recipes (over 500), this is not working. My 2 cents is that I can not get the status to every type of recipes.
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe)
status = recipe.get_status() --> line that returning an error
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe)
print(recipe['type'])
Result:
sync grouping shaker grouping join shaker shaker sync shaker join shaker shaker sync shaker sync sampling shaker sync join vstack grouping shaker sampling shaker sync sync python join shaker join shaker grouping shaker grouping shaker grouping shaker join shaker grouping sync vstack join shaker sampling shaker grouping grouping shaker sync shaker shaker sync grouping shaker shaker join join join shaker grouping grouping sync grouping shaker shaker grouping shaker grouping shaker sync grouping shaker join sync sync
Hello @Pizarro75,
Thank you so much for the confirmation and the response.
Would you please try the following code?
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe.name)
status = recipe.get_status()
DSSProject.list_recipes() returns a list of DSSRecipeListItem objects, and DSSProject.get_recipe() must take a string value of a recipe name as its parameter.
I hope this would help.
Sincerely,
Keiji, Dataiku Technical Support
Thanks. I am getting:
AttributeErrorTraceback (most recent call last) <ipython-input-77-230328bed0fc> in <module>() 3 recipes = project.list_recipes() 4 for recipe in recipes: ----> 5 recipe = project.get_recipe(recipe.name) 6 status = recipe.get_status() AttributeError: 'dict' object has no attribute 'name'
Thank you so much for checking. Then, would you try the following code for the retrieval of the recipe?
recipe = project.get_recipe(recipe['name'])
Thank you @KeijiY ! This is working using this code snippet. I updated my code following your suggestions:
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe['name'])
status = recipe.get_status()
sql = status.data['sql']
KeyErrorTraceback (most recent call last) <ipython-input-81-91641dd714fa> in <module>() 5 recipe = project.get_recipe(recipe['name']) 6 status = recipe.get_status() ----> 7 sql = status.data['sql'] KeyError: 'sql'
I also just figured it out that I not using a recent version of Dataiku DSS, this is probably why sometimes your code does not work. THanks
Hello @Pizarro75,
Thank you so much for the confirmation.
As not all recipes have a SQL query (e.g. a recipe using the DSS engine does not have a SQL query), you will need to check the existence of the 'sql' key before retrieving a SQL query as follows:
if 'sql' in status.data:
sql = status.data['sql']
Thank you @KeijiY , this is working! I was not expected that honestly. I have a support ticket in progress and they weren't able to figure out. Can you keep this conversation open if I have any other questions regarding to that script in the next couple of hours?