Do you know the History of Data Science? READ MORE

How to create recipe using create_recipe function from Dataiku Python API?

suchi92
Level 1
How to create recipe using create_recipe function from Dataiku Python API?

Hello, 

I tried to use Dataiku Python API to create recipes given both .json and .shaker files. I load the .json fle and use it in recipe_proto argument of create_recipe function (https://doc.dataiku.com/dss/latest/python-api/projects.html). Similarly, I load the .shaker file and use it in creation_settings argument of create_recipe function. However, when I run the create_recipe function it returns me this error statement.

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body)
   1020                     stream = stream)
-> 1021             http_res.raise_for_status()
   1022             return http_res

~/dss/code-envs/python/Python_DL/lib/python3.6/site-packages/requests/models.py in raise_for_status(self)
    939         if http_error_msg:
--> 940             raise HTTPError(http_error_msg, response=self)
    941 

HTTPError: 400 Client Error: Bad Request for url: http://127.0.0.1:10001/dip/publicapi/projects/DKU_TUTORIAL_MACHINE_LEARNING_BASICS_1/recipes/

During handling of the above exception, another exception occurred:

DataikuException                          Traceback (most recent call last)
<ipython-input-123-2948e5c54e43> in <module>
----> 1 project.create_recipe(recipe_proto = prototype, creation_settings = shakers)

~/dataiku-dss-9.0.1/python/dataikuapi/dss/project.py in create_recipe(self, recipe_proto, creation_settings)
   1178         definition = {'recipePrototype': recipe_proto, 'creationSettings' : creation_settings}
   1179         recipe_name = self.client._perform_json("POST", "/projects/%s/recipes/" % self.project_key,
-> 1180                        body = definition)['name']   1181         return DSSRecipe(self.client, self.project_key, recipe_name)
   1182 

~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_json(self, method, path, params, body, files, raw_body)
   1035 
   1036     def _perform_json(self, method, path, params=None, body=None,files=None, raw_body=None):
-> 1037         return self._perform_http(method, path,  params=params, body=body, files=files, stream=False, raw_body=raw_body).json()
   1038 
   1039     def _perform_raw(self, method, path, params=None, body=None,files=None, raw_body=None):

~/dataiku-dss-9.0.1/python/dataikuapi/dssclient.py in _perform_http(self, method, path, params, body, stream, files, raw_body)
   1026             except ValueError:
   1027                 ex = {"message": http_res.text}
-> 1028             raise DataikuException("%s: %s" % (ex.get("errorType", "Unknown error"), ex.get("message", "No message")))
   1029 
   1030     def _perform_empty(self, method, path, params=None, body=None, files = None, raw_body=None):

DataikuException: java.lang.IllegalArgumentException: Need to create output dataset or folder, but creationInfo params are suppressing it

 I don't quite understand where creationInfo params come from; I have tried to remove certain keys from .json file but it still gives me the same error. How do I resolve this issue with create_recipe.

I realized that an alternative is to create a new_recipe method following this documentation https://doc.dataiku.com/dss/latest/python-api/flow.html#creating-a-python-recipe but I think it is a hard-work to set it up because each recipe has its own methods and we need to extract certain values from both json and shaker files to insert into the methods to make it works. 

Thank you in advance. 

0 Kudos
3 Replies
RoyE
Dataiker
Dataiker

Hello!

According to our documentation, we recommend using new_recipe as create_recipe can be quite difficult to get parameters correct.

Is there a specifc reason why you are using new_recipe? Could you give more details on your use case for a better idea of what you are trying to accomplish?

As you mentioned, the information that you need can be extracted and set as variables in order to complete the same goal. Please see the below example that extracts: type, input, and output in order to create the recipe in the flow.

import dataiku
import json

client = dataiku.api_client()
project = client.get_project("PROJECT_NAME") #ALL CAPS is required

f = open("/PATH/TO/FILE.json",)
fr = open("/PATH/TO/FILE.shaker",)
recipe_proto = json.load(f)
creation_setting = json.load(fr)

inputdataset = recipe_proto["inputs"]["main"]["items"][0]["ref"]
outputdataset = recipe_proto["outputs"]["main"]["items"][0]["ref"]
recipetype = recipe_proto["type"]

builder = project.new_recipe(recipetype)
builder = builder.with_input(inputdataset)
builder = builder.with_new_output(outputdataset, "filesystem_managed", format_option_id="csv")

recipe = builder.create()

 

Thanks,

Roy

0 Kudos
suchi92
Level 1
Author

Hi @RoyE

Thanks for your reply. The environment of dataiku instance that I am provided with only allows workgroup project creation using macro. One of the reason is that we need to follow certain naming conventions within the workgroup, and therefore project creation is better established through the macro.

My original idea is to import the Project zip from an S3 bucket, load it in Notebook, take the recipes json, shaker / join, and change project keys, then it should probably be working. However, when I tried to do that I got the error as seen above.

In the code that you have attached, it seems that I need to adjust the steps of the recipe based on its type; if its of "grouping" type, then I need to have with_group_key function. Similarly, if the recipe type is "prepare", it is possible that I need add_filter_on_bad_meaning function (depending on my recipe step). I think it would be great if we could have a function that directly takes both the json and shaker directly to create recipe. 

0 Kudos
ATsao
Dataiker
Dataiker

Hi Suchi92, 

This specific error is occurring because you need to create the output dataset first (using create_dataset() method). For example, the following sample code would read the corresponding config files for a sync recipe, create the output dataset, and then use create_recipe() to create the sync recipe. 

import dataiku
import json

# Establish the client
client = dataiku.api_client()
project = client.get_project("PROJECT_KEY")

# Read the local files
f = open("/PATH/TO/FILE.json",)
# fr = open("/PATH/TO/FILE.sync",) # This file is empty so commenting out
recipe_proto = json.load(f)
creation_settings = {} #Nothing needed for sync so creating empty array

# Retrieving output dataset from config file and creating it first
output_dataset = recipe_proto["outputs"]["main"]["items"][0]["ref"]

# Set the necessary connection params
params = {'connection' : 'connection_name', 
          'path': dataiku.default_project_key() + "/" + output_dataset, 
          'table' : output_dataset, 
          'mode' : 'table'}

# Create the dataset
dataset = project.create_dataset(output_dataset, type = 'PostgreSQL', params = params)

# Set dataset to managed
ds_def = dataset.get_definition()
ds_def['managed'] = True
dataset.set_definition(ds_def)

# Create recipe
project.create_recipe(recipe_proto, creation_settings)

 

However, it's worth mentioning that there are some known issues with using this method to create a prepare recipe if the output already exists (in our backlog to fix). Additionally, you would still need to modify the logic a bit based on the recipe type anyways (perhaps you could include a bunch of conditionals to check the recipe type), so you may be better off using the new_recipe() method like previously mentioned. In other words, you won't be able to simply read in the corresponding json and recipe config file, even if you are trying to use the create_recipe() function directly. 

Best,
Andrew

0 Kudos
A banner prompting to get Dataiku DSS