Using Dataiku

Sort by:

421 - 430 of 517

Run a recipe from a scenerio
I am writing a recipe to retrain a custom model created in a Python recipe. This recipe creates a pickle file that contains the model, which is then read by a downstream model that performs prediction…
Answered ✓
Ignition
Started by Erlebacher
Most recent by Marlan
Dec 5, 2022
0
4
Solution by Marlan
Hi @Erlebacher
,
Our approach in this situation is to write the pickle file to a DSS Folder and then build the Folder (which is one of the Build options).
Marlan
Solution by Marlan
Hi @Erlebacher
,
Our approach in this situation is to write the pickle file to a DSS Folder and then build the Folder (which is one of the Build options).
Marlan
Reply to Discussion
Reply to Discussion
Storing a List as a variable
I would like to have a project variable that is a list. How is that done? To be clear, I'd like a list such as: "list": "['a','b','c']" I thought that this would work given that Json interprets this a…
Answered ✓
Ignition
Started by Erlebacher
Most recent by Zach
Dec 5, 2022
0
1
Solution by Zach
Hi @Erlebacher
,

You can load a project variable that's a list in Python by using json.loads()

For example, if I have a project variable that looks like this:

{ "keep_cols": ["a", "b", "c"] }

I can load it using the following code:

import json import dataiku vars = dataiku.get_custom_variables() keep_cols = json.loads(vars["keep_cols"]) print("keep_cols: ", keep_cols)

Thanks,

Zach

Solution by Zach
Hi @Erlebacher
,

You can load a project variable that's a list in Python by using json.loads()

For example, if I have a project variable that looks like this:

{ "keep_cols": ["a", "b", "c"] }

I can load it using the following code:

import json import dataiku vars = dataiku.get_custom_variables() keep_cols = json.loads(vars["keep_cols"]) print("keep_cols: ", keep_cols)

Thanks,

Zach
Reply to Discussion
Reply to Discussion
Distinct recipe
I have a dataset with 10 columns, and would like to remove rows that differ only in the first two columns. So I use the "Distinct Recipe" for this task. I get the result I want, except that the output…
Answered ✓
Visual recipes
Ignition
Started by Erlebacher
Most recent by Erlebacher
Dec 5, 2022
0
7
Solution by
Solution by Erlebacher
Excellent trick to be sure! Nonetheless, it is surprising to me that Dataiku would not have added a simple option in the "distinct" recipe that says "keep all columns". Simple omission, or was there a clear reason for the decision? I wonder.
Reply to Discussion
Reply to Discussion
DataIKU Scenario not running with API - Dataset READ_DATA action Forbidden
While trying to run a scenario remotely, the job is failing. (PFB point of failure from stack trace) [2022/12/05-10:52:06.889] [qtp1423491597-38] [INFO] [dku.job.slave] - Trying to obtain grant: datas…
Answered ✓
Ignition
Started by yjagger
Most recent by yjagger
Dec 5, 2022
0
1
Solution by
Solution by yjagger
I noticed that for the scenario imported, it was being run as someone else.
Once I changed it to Run as => my user, it was fine.
Screenshot attached for reference
Reply to Discussion
Reply to Discussion
Building a recipe
Consider a section of my flow: I right-click on the Python recipe furthest to the right (it would be nice if all recipes had a default label that could be change. Not clear why that cannot be done). I…
Answered ✓
Ignition
Started by Erlebacher
Most recent by Zach
Dec 4, 2022
0
1
Solution by
Solution by Zach
Hi @Erlebacher
,
When you right-clicked the recipe, did you pick the "Build Flow outputs reachable from here" option?
It's expected behavior that this option only shows the furthest downstream outputs (this is what "Flow outputs" means). Nonfiltered_rankings isn't shown because the Output folder is further downstream than it.
If you want to see all of the datasets that will be built, you can choose the "PREVIEW" option:

Thanks,
Zach
Reply to Discussion
Reply to Discussion
Project Duplication Options
Where can I find a detailed explanation of the project duplication option? For example, what are project resources? In addition: my flow has an input folder with several csv files. I assumed that the…
Question
Ignition
Started by Erlebacher
Dec 4, 2022
0
Reply to Discussion
Reply to Discussion
Provenance of a Dataset
I upload a file from my laptop (it is a csv file), and from it, I create a Dataset. I do this several times. I would now like to right-click on one of these datasets and find out the csv file from whi…
Answered ✓
Ignition
Started by Erlebacher
Most recent by Erlebacher
Dec 3, 2022
0
2
Solution by
Solution by Zach
Hi @Erlebacher
,
You can see what CSV file was used to create a dataset by going to the dataset settings, as shown below:

Thanks,
Zach
Reply to Discussion
Reply to Discussion
How to delete a project key
I'm trying to import a project that has been previously deleted, but I'm unable to because the project key still exists. How do I delete the project key? Operating system used: RHEL 7.9
Question
Ignition
Started by VickeyC
Most recent by Zach
Dec 3, 2022
0
1
Last answer by
Last answer by Zach
Hi @VickeyC
,

Normally when a project is deleted, the key can be reused. Are you sure that the project has actually been deleted?

If you're not a DSS administrator, you can only see projects that you have access to. In this case, please check with an administrator to verify if the project exists.

It's also possible that there's an existing project that is using the key, but has a different display name. You can run the following code in a Python notebook to list all existing projects and show their keys & names. The notebook must be created by a DSS administrator; otherwise it will only list projects that you have access to.

import dataiku client = dataiku.api_client() for project in client.list_projects(): print("Key:", project["projectKey"]) print("Display name:", project["name"]) print("")

Thanks,

Zach
Reply to Discussion
Reply to Discussion
How can I run two steps in parallel in a scenario
Hi, I am trying to build a scenario in which I want two data sets to be built. The datasets are built using sql code recipe already written. I am wondering if there is any way in which these two datas…
Answered ✓
Ignition
Started by yjagger
Most recent by Miguel Angel
Dec 2, 2022
0
1
Solution by
Solution by Miguel Angel
Hi,
You can add more than one dataset to be built in the same Scenario step. However, this does not mean their build will be exactly in parallel. It will depend on the number of activities the job has to carry, e.g. due to a different amount of upstream datasets that each need to have built.
The run steps condition the behaviour of the Scenario steps. The ‘Always’ condition ensures that a step will be run (or attempted to be run) regardless of the outcome of preceding steps.
More information can be found in the documentation: https://doc.dataiku.com/dss/latest/scenarios/step_flow_control.html#run-step-conditionally

Reply to Discussion
Reply to Discussion
Groupby and transform
I have a dataset with USER and ITEMS. I wish to perform a groupby and use size() or count() aggregation to find the count in each group. But I wish to then create a new column in the original dataset …
Answered ✓
Ignition
Started by Erlebacher
Most recent by Miguel Angel
Dec 2, 2022
0
1
Solution by
Solution by Miguel Angel
Hi Erlebacher,
The Prepare Recipe is primarily used for row operations, though there are some processors which can do operations across rows. In order to do aggregations, the Group or Window Recipes are more appropiate. Both can do counts and other aggregations out of the box. Moreover, you can write your own custom aggregations.
Regarding performing transformations in the original dataset, this is rather counter-intuitive to the way the Flow is laid out. At times, you can still power through and set the dataset to the output of a recipe point to the same data location as the recipe feed. However, overlapping datasets can potentially cause problems to the Flow's lineage.
Reply to Discussion
Reply to Discussion