How to create recipe using create_recipe function from Dataiku Python API?
Hello,
I tried to use Dataiku Python API to create recipes given both .json and .shaker files. I load the .json fle and use it in recipe_proto argument of create_recipe function (https://doc.dataiku.com/dss/latest/python-api/projects.html). Similarly, I load the .shaker file and use it in creation_settings argument of create_recipe function. However, when I run the create_recipe function it returns me this error statement.
--------------------------------------------------------------------------- HTTPError Traceback (most recent call last) ~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body) 1020 stream = stream) -> 1021 http_res.raise_for_status() 1022 return http_res ~/dss/code-envs/python/Python_DL/lib/python3.6/site-packages/requests/models.py in raise_for_status(self) 939 if http_error_msg: --> 940 raise HTTPError(http_error_msg, response=self) 941 HTTPError: 400 Client Error: Bad Request for url: http://127.0.0.1:10001/dip/publicapi/projects/DKU_TUTORIAL_MACHINE_LEARNING_BASICS_1/recipes/ During handling of the above exception, another exception occurred: DataikuException Traceback (most recent call last) <ipython-input-123-2948e5c54e43> in <module> ----> 1 project.create_recipe(recipe_proto = prototype, creation_settings = shakers) ~/dataiku-dss-9.0.1/python/dataikuapi/dss/project.py in create_recipe(self, recipe_proto, creation_settings) 1178 definition = {'recipePrototype': recipe_proto, 'creationSettings' : creation_settings} 1179 recipe_name = self.client._perform_json("POST", "/projects/%s/recipes/" % self.project_key, -> 1180 body = definition)['name'] 1181 return DSSRecipe(self.client, self.project_key, recipe_name) 1182 ~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_json(self, method, path, params, body, files, raw_body) 1035 1036 def _perform_json(self, method, path, params=None, body=None,files=None, raw_body=None): -> 1037 return self._perform_http(method, path, params=params, body=body, files=files, stream=False, raw_body=raw_body).json() 1038 1039 def _perform_raw(self, method, path, params=None, body=None,files=None, raw_body=None): ~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body) 1026 except ValueError: 1027 ex = {"message": http_res.text} -> 1028 raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("message", "No message"))) 1029 1030 def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None): DataikuException: java.lang.IllegalArgumentException: Need to create output dataset or folder, but creationInfo params are suppressing it
I don't quite understand where creationInfo params come from; I have tried to remove certain keys from .json file but it still gives me the same error. How do I resolve this issue with create_recipe.
I realized that an alternative is to create a new_recipe method following this documentation https://doc.dataiku.com/dss/latest/python-api/flow.html#creating-a-python-recipe but I think it is a hard-work to set it up because each recipe has its own methods and we need to extract certain values from both json and shaker files to insert into the methods to make it works.
Thank you in advance.
Answers
-
RoyE Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 31 Dataiker
Hello!
According to our documentation, we recommend using new_recipe as create_recipe can be quite difficult to get parameters correct.
Is there a specifc reason why you are using new_recipe? Could you give more details on your use case for a better idea of what you are trying to accomplish?
As you mentioned, the information that you need can be extracted and set as variables in order to complete the same goal. Please see the below example that extracts: type, input, and output in order to create the recipe in the flow.
import dataiku import json client = dataiku.api_client() project = client.get_project("PROJECT_NAME") #ALL CAPS is required f = open("/PATH/TO/FILE.json",) fr = open("/PATH/TO/FILE.shaker",) recipe_proto = json.load(f) creation_setting = json.load(fr) inputdataset = recipe_proto["inputs"]["main"]["items"][0]["ref"] outputdataset = recipe_proto["outputs"]["main"]["items"][0]["ref"] recipetype = recipe_proto["type"] builder = project.new_recipe(recipetype) builder = builder.with_input(inputdataset) builder = builder.with_new_output(outputdataset, "filesystem_managed", format_option_id="csv") recipe = builder.create()
Thanks,
Roy
-
Hi @RoyE
,Thanks for your reply. The environment of dataiku instance that I am provided with only allows workgroup project creation using macro. One of the reason is that we need to follow certain naming conventions within the workgroup, and therefore project creation is better established through the macro.
My original idea is to import the Project zip from an S3 bucket, load it in Notebook, take the recipes json, shaker / join, and change project keys, then it should probably be working. However, when I tried to do that I got the error as seen above.
In the code that you have attached, it seems that I need to adjust the steps of the recipe based on its type; if its of "grouping" type, then I need to have with_group_key function. Similarly, if the recipe type is "prepare", it is possible that I need add_filter_on_bad_meaning function (depending on my recipe step). I think it would be great if we could have a function that directly takes both the json and shaker directly to create recipe.
-
Hi Suchi92,
This specific error is occurring because you need to create the output dataset first (using create_dataset() method). For example, the following sample code would read the corresponding config files for a sync recipe, create the output dataset, and then use create_recipe() to create the sync recipe.
import dataiku import json # Establish the client client = dataiku.api_client() project = client.get_project("PROJECT_KEY") # Read the local files f = open("/PATH/TO/FILE.json",) # fr = open("/PATH/TO/FILE.sync",) # This file is empty so commenting out recipe_proto = json.load(f) creation_settings = {} #Nothing needed for sync so creating empty array # Retrieving output dataset from config file and creating it first output_dataset = recipe_proto["outputs"]["main"]["items"][0]["ref"] # Set the necessary connection params params = {'connection' : 'connection_name', 'path': dataiku.default_project_key() + "/" + output_dataset, 'table' : output_dataset, 'mode' : 'table'} # Create the dataset dataset = project.create_dataset(output_dataset, type = 'PostgreSQL', params = params) # Set dataset to managed ds_def = dataset.get_definition() ds_def['managed'] = True dataset.set_definition(ds_def) # Create recipe project.create_recipe(recipe_proto, creation_settings)
However, it's worth mentioning that there are some known issues with using this method to create a prepare recipe if the output already exists (in our backlog to fix). Additionally, you would still need to modify the logic a bit based on the recipe type anyways (perhaps you could include a bunch of conditionals to check the recipe type), so you may be better off using the new_recipe() method like previously mentioned. In other words, you won't be able to simply read in the corresponding json and recipe config file, even if you are trying to use the create_recipe() function directly.
Best,
Andrew