How to create recipe using create_recipe function from Dataiku Python API?

Options
suchi92
suchi92 Registered Posts: 2 ✭✭✭✭

Hello,

I tried to use Dataiku Python API to create recipes given both .json and .shaker files. I load the .json fle and use it in recipe_proto argument of create_recipe function (https://doc.dataiku.com/dss/latest/python-api/projects.html). Similarly, I load the .shaker file and use it in creation_settings argument of create_recipe function. However, when I run the create_recipe function it returns me this error statement.

---------------------------------------------------------------------------HTTPError                                 Traceback (most recent call last)~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body)   1020                     stream = stream)-> 1021             http_res.raise_for_status()   1022             return http_res~/dss/code-envs/python/Python_DL/lib/python3.6/site-packages/requests/models.py in raise_for_status(self)    939         if http_error_msg:--> 940             raise HTTPError(http_error_msg, response=self)    941HTTPError: 400 Client Error: Bad Request for url: http://127.0.0.1:10001/dip/publicapi/projects/DKU_TUTORIAL_MACHINE_LEARNING_BASICS_1/recipes/During handling of the above exception, another exception occurred:DataikuException                          Traceback (most recent call last)<ipython-input-123-2948e5c54e43> in <module>----> 1 project.create_recipe(recipe_proto = prototype, creation_settings = shakers)~/dataiku-dss-9.0.1/python/dataikuapi/dss/project.py in create_recipe(self, recipe_proto, creation_settings)   1178         definition = {'recipePrototype': recipe_proto, 'creationSettings' : creation_settings}   1179         recipe_name = self.client._perform_json("POST", "/projects/%s/recipes/" % self.project_key,-> 1180                        body = definition)['name']   1181         return DSSRecipe(self.client, self.project_key, recipe_name)   1182~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_json(self, method, path, params, body, files, raw_body)   1035   1036     def _perform_json(self, method, path, params=None, body=None,files=None, raw_body=None):-> 1037         return self._perform_http(method, path,  params=params, body=body, files=files, stream=False, raw_body=raw_body).json()   1038   1039     def _perform_raw(self, method, path, params=None, body=None,files=None, raw_body=None):~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body)   1026             except ValueError:   1027                 ex = {"message": http_res.text}-> 1028             raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("message", "No message")))   1029   1030     def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):DataikuException: java.lang.IllegalArgumentException: Need to create output dataset or folder, but creationInfo params are suppressing it

I don't quite understand where creationInfo params come from; I have tried to remove certain keys from .json file but it still gives me the same error. How do I resolve this issue with create_recipe.

I realized that an alternative is to create a new_recipe method following this documentation https://doc.dataiku.com/dss/latest/python-api/flow.html#creating-a-python-recipe but I think it is a hard-work to set it up because each recipe has its own methods and we need to extract certain values from both json and shaker files to insert into the methods to make it works.

Thank you in advance.

Answers

  • RoyE
    RoyE Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 31 Dataiker
    Options

    Hello!

    According to our documentation, we recommend using new_recipe as create_recipe can be quite difficult to get parameters correct.

    Is there a specifc reason why you are using new_recipe? Could you give more details on your use case for a better idea of what you are trying to accomplish?

    As you mentioned, the information that you need can be extracted and set as variables in order to complete the same goal. Please see the below example that extracts: type, input, and output in order to create the recipe in the flow.

    import dataikuimport jsonclient = dataiku.api_client()project = client.get_project("PROJECT_NAME") #ALL CAPS is requiredf = open("/PATH/TO/FILE.json",)fr = open("/PATH/TO/FILE.shaker",)recipe_proto = json.load(f)creation_setting = json.load(fr)inputdataset = recipe_proto["inputs"]["main"]["items"][0]["ref"]outputdataset = recipe_proto["outputs"]["main"]["items"][0]["ref"]recipetype = recipe_proto["type"]builder = project.new_recipe(recipetype)builder = builder.with_input(inputdataset)builder = builder.with_new_output(outputdataset, "filesystem_managed", format_option_id="csv")recipe = builder.create()

    Thanks,

    Roy

  • suchi92
    suchi92 Registered Posts: 2 ✭✭✭✭
    Options

    Hi @RoyE
    ,

    Thanks for your reply. The environment of dataiku instance that I am provided with only allows workgroup project creation using macro. One of the reason is that we need to follow certain naming conventions within the workgroup, and therefore project creation is better established through the macro.

    My original idea is to import the Project zip from an S3 bucket, load it in Notebook, take the recipes json, shaker / join, and change project keys, then it should probably be working. However, when I tried to do that I got the error as seen above.

    In the code that you have attached, it seems that I need to adjust the steps of the recipe based on its type; if its of "grouping" type, then I need to have with_group_key function. Similarly, if the recipe type is "prepare", it is possible that I need add_filter_on_bad_meaning function (depending on my recipe step). I think it would be great if we could have a function that directly takes both the json and shaker directly to create recipe.

  • ATsao
    ATsao Dataiker Alumni, Registered Posts: 139 ✭✭✭✭✭✭✭✭
    Options

    Hi Suchi92,

    This specific error is occurring because you need to create the output dataset first (using create_dataset() method). For example, the following sample code would read the corresponding config files for a sync recipe, create the output dataset, and then use create_recipe() to create the sync recipe.

    import dataikuimport json# Establish the clientclient = dataiku.api_client()project = client.get_project("PROJECT_KEY")# Read the local filesf = open("/PATH/TO/FILE.json",)# fr = open("/PATH/TO/FILE.sync",) # This file is empty so commenting outrecipe_proto = json.load(f)creation_settings = {} #Nothing needed for sync so creating empty array# Retrieving output dataset from config file and creating it firstoutput_dataset = recipe_proto["outputs"]["main"]["items"][0]["ref"]# Set the necessary connection paramsparams = {'connection' : 'connection_name','path': dataiku.default_project_key() + "/" + output_dataset,'table' : output_dataset,'mode' : 'table'}# Create the datasetdataset = project.create_dataset(output_dataset, type = 'PostgreSQL', params = params)# Set dataset to managedds_def = dataset.get_definition()ds_def['managed'] = Truedataset.set_definition(ds_def)# Create recipeproject.create_recipe(recipe_proto, creation_settings)

    However, it's worth mentioning that there are some known issues with using this method to create a prepare recipe if the output already exists (in our backlog to fix). Additionally, you would still need to modify the logic a bit based on the recipe type anyways (perhaps you could include a bunch of conditionals to check the recipe type), so you may be better off using the new_recipe() method like previously mentioned. In other words, you won't be able to simply read in the corresponding json and recipe config file, even if you are trying to use the create_recipe() function directly.

    Best,
    Andrew

Setup Info
    Tags
      Help me…