Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on November 26, 2021 5:19PM
Likes: 0
Replies: 8
I am in charge to migrate a whole Dataiku project to GCP BigQuery.
In my Dataiku project, I have around 90 input tables and 3 output tables, between those tables, around 500 different jobs (mostly JOIN, GROUP BY) and a single Python job.
Hello @Pizarro75
,
Thank you so much for posting the question on Community.
For the recipes whose types are other than SQL (such as Join and Prepare) and engines are "In-database (SQL)", would you please check if you can retrieve their SQL queries as follows?
import dataiku import pprint client = dataiku.api_client() project = client.get_default_project() recipe = project.get_recipe('your_recipe_name') status = recipe.get_status() sql = status.data['sql'] pprint.pprint(sql)
I hope this would help.
Sincerely,
Keiji, Dataiku Technical Support
Thank you Keiji, this is working perfectly. I actually submitted a ticket in Dataiku suppot page and they were not able to find a way to help me. I just updated your script and I am getting the results I was expecting.
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipe = project.get_recipe('compute_MAST_STKO_STPO')
status = recipe.get_status()
sql = status.data['sql']
print(sql)
However, when I a trying to loop over all my recipes (over 500), this is not working. My 2 cents is that I can not get the status to every type of recipes.
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe)
status = recipe.get_status() --> line that returning an error
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe)
print(recipe['type'])
Result:
sync
grouping
shaker
grouping
join
shaker
shaker
sync
shaker
join
shaker
shaker
sync
shaker
sync
sampling
shaker
sync
join
vstack
grouping
shaker
sampling
shaker
sync
sync
python
join
shaker
join
shaker
grouping
shaker
grouping
shaker
grouping
shaker
join
shaker
grouping
sync
vstack
join
shaker
sampling
shaker
grouping
grouping
shaker
sync
shaker
shaker
sync
grouping
shaker
shaker
join
join
join
shaker
grouping
grouping
sync
grouping
shaker
shaker
grouping
shaker
grouping
shaker
sync
grouping
shaker
join
sync
sync
Hello @Pizarro75
,
Thank you so much for the confirmation and the response.
Would you please try the following code?
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe.name)
status = recipe.get_status()
DSSProject.list_recipes() returns a list of DSSRecipeListItem objects, and DSSProject.get_recipe() must take a string value of a recipe name as its parameter.
I hope this would help.
Sincerely,
Keiji, Dataiku Technical Support
Thanks. I am getting:
AttributeErrorTraceback (most recent call last) <ipython-input-77-230328bed0fc> in <module>() 3 recipes = project.list_recipes() 4 for recipe in recipes: ----> 5 recipe = project.get_recipe(recipe.name) 6 status = recipe.get_status() AttributeError: 'dict' object has no attribute 'name'
Thank you so much for checking. Then, would you try the following code for the retrieval of the recipe?
recipe = project.get_recipe(recipe['name'])
Thank you @KeijiY
! This is working using this code snippet. I updated my code following your suggestions:
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe['name'])
status = recipe.get_status()
sql = status.data['sql']
KeyErrorTraceback (most recent call last) <ipython-input-81-91641dd714fa> in <module>() 5 recipe = project.get_recipe(recipe['name']) 6 status = recipe.get_status() ----> 7 sql = status.data['sql'] KeyError: 'sql'
I also just figured it out that I not using a recent version of Dataiku DSS, this is probably why sometimes your code does not work. THanks
Hello @Pizarro75
,
Thank you so much for the confirmation.
As not all recipes have a SQL query (e.g. a recipe using the DSS engine does not have a SQL query), you will need to check the existence of the 'sql' key before retrieving a SQL query as follows:
if 'sql' in status.data: sql = status.data['sql']
Thank you @KeijiY
, this is working! I was not expected that honestly. I have a support ticket in progress and they weren't able to figure out. Can you keep this conversation open if I have any other questions regarding to that script in the next couple of hours?