Using Dataiku
- I am writing a recipe to retrain a custom model created in a Python recipe. This recipe creates a pickle file that contains the model, which is then read by a downstream model that performs prediction…Solution by Marlan
Hi @Erlebacher
,Our approach in this situation is to write the pickle file to a DSS Folder and then build the Folder (which is one of the Build options).
Marlan
Solution by MarlanHi @Erlebacher
,Our approach in this situation is to write the pickle file to a DSS Folder and then build the Folder (which is one of the Build options).
Marlan
- I would like to have a project variable that is a list. How is that done? To be clear, I'd like a list such as: "list": "['a','b','c']" I thought that this would work given that Json interprets this a…Solution by Zach
Hi @Erlebacher
,You can load a project variable that's a list in Python by using json.loads()
For example, if I have a project variable that looks like this:
{ "keep_cols": ["a", "b", "c"] }
I can load it using the following code:
import json import dataiku vars = dataiku.get_custom_variables() keep_cols = json.loads(vars["keep_cols"]) print("keep_cols: ", keep_cols)
Thanks,
Zach
Solution by ZachHi @Erlebacher
,You can load a project variable that's a list in Python by using json.loads()
For example, if I have a project variable that looks like this:
{ "keep_cols": ["a", "b", "c"] }
I can load it using the following code:
import json import dataiku vars = dataiku.get_custom_variables() keep_cols = json.loads(vars["keep_cols"]) print("keep_cols: ", keep_cols)
Thanks,
Zach
- I have a dataset with 10 columns, and would like to remove rows that differ only in the first two columns. So I use the "Distinct Recipe" for this task. I get the result I want, except that the output…Solution by
- While trying to run a scenario remotely, the job is failing. (PFB point of failure from stack trace) [2022/12/05-10:52:06.889] [qtp1423491597-38] [INFO] [dku.job.slave] - Trying to obtain grant: datas…Solution by
- Consider a section of my flow: I right-click on the Python recipe furthest to the right (it would be nice if all recipes had a default label that could be change. Not clear why that cannot be done). I…Solution bySolution by Zach
Hi @Erlebacher
,When you right-clicked the recipe, did you pick the "Build Flow outputs reachable from here" option?
It's expected behavior that this option only shows the furthest downstream outputs (this is what "Flow outputs" means). Nonfiltered_rankings isn't shown because the Output folder is further downstream than it.
If you want to see all of the datasets that will be built, you can choose the "PREVIEW" option:
Thanks,
Zach
- Where can I find a detailed explanation of the project duplication option? For example, what are project resources? In addition: my flow has an input folder with several csv files. I assumed that the…
- I upload a file from my laptop (it is a csv file), and from it, I create a Dataset. I do this several times. I would now like to right-click on one of these datasets and find out the csv file from whi…Solution bySolution by Zach
Hi @Erlebacher
,You can see what CSV file was used to create a dataset by going to the dataset settings, as shown below:
Thanks,
Zach
- I'm trying to import a project that has been previously deleted, but I'm unable to because the project key still exists. How do I delete the project key? Operating system used: RHEL 7.9Last answer byLast answer by Zach
Hi @VickeyC
,Normally when a project is deleted, the key can be reused. Are you sure that the project has actually been deleted?
If you're not a DSS administrator, you can only see projects that you have access to. In this case, please check with an administrator to verify if the project exists.
It's also possible that there's an existing project that is using the key, but has a different display name. You can run the following code in a Python notebook to list all existing projects and show their keys & names. The notebook must be created by a DSS administrator; otherwise it will only list projects that you have access to.
import dataiku client = dataiku.api_client() for project in client.list_projects(): print("Key:", project["projectKey"]) print("Display name:", project["name"]) print("")
Thanks,
Zach
- Hi, I am trying to build a scenario in which I want two data sets to be built. The datasets are built using sql code recipe already written. I am wondering if there is any way in which these two datas…Solution bySolution by Miguel Angel
Hi,
You can add more than one dataset to be built in the same Scenario step. However, this does not mean their build will be exactly in parallel. It will depend on the number of activities the job has to carry, e.g. due to a different amount of upstream datasets that each need to have built.
The run steps condition the behaviour of the Scenario steps. The ‘Always’ condition ensures that a step will be run (or attempted to be run) regardless of the outcome of preceding steps.
More information can be found in the documentation: https://doc.dataiku.com/dss/latest/scenarios/step_flow_control.html#run-step-conditionally
- I have a dataset with USER and ITEMS. I wish to perform a groupby and use size() or count() aggregation to find the count in each group. But I wish to then create a new column in the original dataset …Solution bySolution by Miguel Angel
Hi Erlebacher,
The Prepare Recipe is primarily used for row operations, though there are some processors which can do operations across rows. In order to do aggregations, the Group or Window Recipes are more appropiate. Both can do counts and other aggregations out of the box. Moreover, you can write your own custom aggregations.
Regarding performing transformations in the original dataset, this is rather counter-intuitive to the way the Flow is laid out. At times, you can still power through and set the dataset to the output of a recipe point to the same data location as the recipe feed. However, overlapping datasets can potentially cause problems to the Flow's lineage.