I am new in using Dataiku API. I tried some simple examples such as create dataset, delete dataset and so on. Also I found one of your examples that creates Python recipe and sets inputs and outputs
from dataikuapi import GroupingRecipeCreator builder = GroupingRecipeCreator('test_group', project) builder = builder.with_input("input_dataset_name") builder = builder.with_new_output("output_dataset_name", "hdfs_managed", format_option_id="PARQUET_HIVE") builder = builder.with_group_key("quantity") # the recipe is created with one grouping key recipe= builder.build()
Basically object builder helps to create a recipe. But is there any way to run a recipe? Or is it only possible to run this in Dataiku manually?
The philosophy of running a flow of datasets, recipes and models in Dataiku revolves around the concept of Job and Scenario. In the API, you do not run a Recipe but rather build its output, either using a Job or a Scenario.
If you plan on using different elements of the API to create a Dataiku project, test it and automate it, I would advise:
1. Creating the datasets, recipes and models using https://doc.dataiku.com/dss/latest/publicapi/client-python/datasets.html, https://doc.dataiku.com/dss/latest/publicapi/client-python/recipes.html and https://doc.dataiku.com/dss/latest/publicapi/client-python/ml.html
2. Build/train some datasets/models by launching Jobs building the outputs(s) of the recipe: https://doc.dataiku.com/dss/latest/publicapi/client-python/jobs.html
3. Create a scenario to automate the update of datasets and models: https://doc.dataiku.com/dss/latest/publicapi/client-python/scenarios.html
In general, it may be faster to use the interface to initialize a "template project", including scenarios. Then copy this template several times with some programmatic changes using the API.