Python script to export any kind of recipes into SQL?
I am in charge to migrate a whole Dataiku project to GCP BigQuery.
In my Dataiku project, I have around 90 input tables and 3 output tables, between those tables, around 500 different jobs (mostly JOIN, GROUP BY) and a single Python job.
Answers
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @Pizarro75
,Thank you so much for posting the question on Community.
For the recipes whose types are other than SQL (such as Join and Prepare) and engines are "In-database (SQL)", would you please check if you can retrieve their SQL queries as follows?
import dataiku import pprint client = dataiku.api_client() project = client.get_default_project() recipe = project.get_recipe('your_recipe_name') status = recipe.get_status() sql = status.data['sql'] pprint.pprint(sql)
I hope this would help.
Sincerely,
Keiji, Dataiku Technical Support -
Thank you Keiji, this is working perfectly. I actually submitted a ticket in Dataiku suppot page and they were not able to find a way to help me. I just updated your script and I am getting the results I was expecting.
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipe = project.get_recipe('compute_MAST_STKO_STPO')
status = recipe.get_status()
sql = status.data['sql']
print(sql)However, when I a trying to loop over all my recipes (over 500), this is not working. My 2 cents is that I can not get the status to every type of recipes.
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe)
status = recipe.get_status() --> line that returning an errorError log
DataikuException: java.lang.IllegalArgumentException: "{u'inputs': {u'main': {u'items': [{u'ref': u'BOM_SQL', u'deps': []}]}}, u'maxRunningActivities': 0, u'name': u'compute_BOM', u'versionTag': {u'lastModifiedBy': {u'login': u'KarenR'}, u'lastModifiedOn': 1618396290950, u'versionNumber': 13}, u'optionalDependencies': False, u'customMeta': {u'kv': {}}, u'outputs': {u'main': {u'items': [{u'appendMode': False, u'ref': u'BOM'}]}}, u'tags': [u'Input_sync_recipes'], u'creationTag': {u'lastModifiedBy': {u'login': u'amaury_Lethy'}, u'lastModifiedOn': 1580286049898, u'versionNumber': 0}, u'params': {u'engineParams': {u'tdchParams': {u'splitMode': u'DEFAULT', u'numberOfExecutors': 2}, u'maxThreads': 4, u'hive': {u'skipPrerunValidate': False, u'hiveconf': [], u'executionEngine': u'HIVECLI_GLOBAL', u'inheritConf': u'default', u'addDkuUdf': False}, u'sparkSQL': {u'overwriteOutputSchema': False, u'useGlobalMetastore': False, u'executionEngine': u'DATABRICKS', u'readParams': {u'map': {}, u'mode': u'AUTO', u'autoModeRepartitionInto': 10}, u'sparkConfig': {u'inheritConf': u'default', u'conf': []}, u'pipelineAllowMerge': True, u'pipelineAllowStart': True, u'skipPrerunValidate': False}, u'lowerCaseSchemaIfEngineRequiresIt': False, u'sqlPipelineParams': {u'pipelineAllowStart': True, u'pipelineAllowMerge': True}, u'impala': {u'forceStreamMode': False}}, u'forcePipelineableForTests': False, u'schemaMode': u'STRICT_SYNC'}, u'checklists': {u'checklists': []}, u'redispatchPartitioning': False, u'neverRecomputeExistingPartitions': False, u'type': u'sync', u'customFields': {}, u'projectKey': u'MDMPLANIF'}.json" is not a valid file/directory name (forbidden characters or too long)client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe)
print(recipe['type'])Result:
sync grouping shaker grouping join shaker shaker sync shaker join shaker shaker sync shaker sync sampling shaker sync join vstack grouping shaker sampling shaker sync sync python join shaker join shaker grouping shaker grouping shaker grouping shaker join shaker grouping sync vstack join shaker sampling shaker grouping grouping shaker sync shaker shaker sync grouping shaker shaker join join join shaker grouping grouping sync grouping shaker shaker grouping shaker grouping shaker sync grouping shaker join sync sync
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @Pizarro75
,Thank you so much for the confirmation and the response.
Would you please try the following code?
client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe.name)
status = recipe.get_status()DSSProject.list_recipes() returns a list of DSSRecipeListItem objects, and DSSProject.get_recipe() must take a string value of a recipe name as its parameter.
I hope this would help.
Sincerely,
Keiji, Dataiku Technical Support -
Thanks. I am getting:
AttributeErrorTraceback (most recent call last) <ipython-input-77-230328bed0fc> in <module>() 3 recipes = project.list_recipes() 4 for recipe in recipes: ----> 5 recipe = project.get_recipe(recipe.name) 6 status = recipe.get_status() AttributeError: 'dict' object has no attribute 'name'
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Thank you so much for checking. Then, would you try the following code for the retrieval of the recipe?
recipe = project.get_recipe(recipe['name'])
-
Thank you @KeijiY
! This is working using this code snippet. I updated my code following your suggestions:client = dataiku.api_client()
project = client.get_project('MDMPLANIF')
recipes = project.list_recipes()
for recipe in recipes:
recipe = project.get_recipe(recipe['name'])
status = recipe.get_status()
sql = status.data['sql']Error log:KeyErrorTraceback (most recent call last) <ipython-input-81-91641dd714fa> in <module>() 5 recipe = project.get_recipe(recipe['name']) 6 status = recipe.get_status() ----> 7 sql = status.data['sql'] KeyError: 'sql'
I also just figured it out that I not using a recent version of Dataiku DSS, this is probably why sometimes your code does not work. THanks
-
Keiji Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hello @Pizarro75
,Thank you so much for the confirmation.
As not all recipes have a SQL query (e.g. a recipe using the DSS engine does not have a SQL query), you will need to check the existence of the 'sql' key before retrieving a SQL query as follows:
if 'sql' in status.data: sql = status.data['sql']
-
Thank you @KeijiY
, this is working! I was not expected that honestly. I have a support ticket in progress and they weren't able to figure out. Can you keep this conversation open if I have any other questions regarding to that script in the next couple of hours?