Using Dataiku

Sort by:
421 - 430 of 517
  • I am writing a recipe to retrain a custom model created in a Python recipe. This recipe creates a pickle file that contains the model, which is then read by a downstream model that performs prediction…
    Answered ✓
    Started by Erlebacher
    Most recent by Marlan
    0
    4
    Marlan
    Solution by Marlan

    Hi @Erlebacher
    ,

    Our approach in this situation is to write the pickle file to a DSS Folder and then build the Folder (which is one of the Build options).

    Marlan

    Marlan
    Solution by Marlan

    Hi @Erlebacher
    ,

    Our approach in this situation is to write the pickle file to a DSS Folder and then build the Folder (which is one of the Build options).

    Marlan

  • I would like to have a project variable that is a list. How is that done? To be clear, I'd like a list such as: "list": "['a','b','c']" I thought that this would work given that Json interprets this a…
    Answered ✓
    Started by Erlebacher
    Most recent by Zach
    0
    1
    Zach
    Solution by Zach

    Hi @Erlebacher
    ,

    You can load a project variable that's a list in Python by using json.loads()

    For example, if I have a project variable that looks like this:

    {
      "keep_cols": ["a", "b", "c"]
    }

    I can load it using the following code:

    import json
    import dataiku
    
    vars = dataiku.get_custom_variables()
    keep_cols = json.loads(vars["keep_cols"])
    print("keep_cols: ", keep_cols)

    Thanks,

    Zach

    Zach
    Solution by Zach

    Hi @Erlebacher
    ,

    You can load a project variable that's a list in Python by using json.loads()

    For example, if I have a project variable that looks like this:

    {
      "keep_cols": ["a", "b", "c"]
    }

    I can load it using the following code:

    import json
    import dataiku
    
    vars = dataiku.get_custom_variables()
    keep_cols = json.loads(vars["keep_cols"])
    print("keep_cols: ", keep_cols)

    Thanks,

    Zach

  • I have a dataset with 10 columns, and would like to remove rows that differ only in the first two columns. So I use the "Distinct Recipe" for this task. I get the result I want, except that the output…
    Answered ✓
    Started by Erlebacher
    Most recent by Erlebacher
    0
    7
    Solution by
    Erlebacher
    Solution by Erlebacher

    Excellent trick to be sure! Nonetheless, it is surprising to me that Dataiku would not have added a simple option in the "distinct" recipe that says "keep all columns". Simple omission, or was there a clear reason for the decision? I wonder.

  • While trying to run a scenario remotely, the job is failing. (PFB point of failure from stack trace) [2022/12/05-10:52:06.889] [qtp1423491597-38] [INFO] [dku.job.slave] - Trying to obtain grant: datas…
    Answered ✓
    Started by yjagger
    Most recent by yjagger
    0
    1
    Solution by
    yjagger
    Solution by yjagger

    I noticed that for the scenario imported, it was being run as someone else.

    Once I changed it to Run as => my user, it was fine.

    Screenshot attached for reference

  • Consider a section of my flow: I right-click on the Python recipe furthest to the right (it would be nice if all recipes had a default label that could be change. Not clear why that cannot be done). I…
    Answered ✓
    Started by Erlebacher
    Most recent by Zach
    0
    1
    Solution by
    Zach
    Solution by Zach

    Hi @Erlebacher
    ,

    When you right-clicked the recipe, did you pick the "Build Flow outputs reachable from here" option?

    It's expected behavior that this option only shows the furthest downstream outputs (this is what "Flow outputs" means). Nonfiltered_rankings isn't shown because the Output folder is further downstream than it.

    If you want to see all of the datasets that will be built, you can choose the "PREVIEW" option:

    C37BB15B-F784-4F79-937D-5E7660553946_1_201_a.jpeg

    Thanks,

    Zach

  • Where can I find a detailed explanation of the project duplication option? For example, what are project resources? In addition: my flow has an input folder with several csv files. I assumed that the…
    Question
    Started by Erlebacher
    0
  • I upload a file from my laptop (it is a csv file), and from it, I create a Dataset. I do this several times. I would now like to right-click on one of these datasets and find out the csv file from whi…
    Answered ✓
    Started by Erlebacher
    Most recent by Erlebacher
    0
    2
    Solution by
    Zach
    Solution by Zach

    Hi @Erlebacher
    ,

    You can see what CSV file was used to create a dataset by going to the dataset settings, as shown below:

    A8A74462-5266-428A-9D36-D2C2561E6A0E_1_201_a.jpeg

    Thanks,

    Zach

  • I'm trying to import a project that has been previously deleted, but I'm unable to because the project key still exists. How do I delete the project key? Operating system used: RHEL 7.9
    Question
    Started by VickeyC
    Most recent by Zach
    0
    1
    Last answer by
    Zach
    Last answer by Zach

    Hi @VickeyC
    ,

    Normally when a project is deleted, the key can be reused. Are you sure that the project has actually been deleted?

    If you're not a DSS administrator, you can only see projects that you have access to. In this case, please check with an administrator to verify if the project exists.

    It's also possible that there's an existing project that is using the key, but has a different display name. You can run the following code in a Python notebook to list all existing projects and show their keys & names. The notebook must be created by a DSS administrator; otherwise it will only list projects that you have access to.

    import dataiku
    
    client = dataiku.api_client()
    
    for project in client.list_projects():
        print("Key:", project["projectKey"])
        print("Display name:", project["name"])
        print("")

    Thanks,

    Zach

  • Hi, I am trying to build a scenario in which I want two data sets to be built. The datasets are built using sql code recipe already written. I am wondering if there is any way in which these two datas…
    Answered ✓
    Started by yjagger
    Most recent by Miguel Angel
    0
    1
    Solution by
    Miguel Angel
    Solution by Miguel Angel

    Hi,

    You can add more than one dataset to be built in the same Scenario step. However, this does not mean their build will be exactly in parallel. It will depend on the number of activities the job has to carry, e.g. due to a different amount of upstream datasets that each need to have built.

    The run steps condition the behaviour of the Scenario steps. The ‘Always’ condition ensures that a step will be run (or attempted to be run) regardless of the outcome of preceding steps.

    More information can be found in the documentation: https://doc.dataiku.com/dss/latest/scenarios/step_flow_control.html#run-step-conditionally

  • I have a dataset with USER and ITEMS. I wish to perform a groupby and use size() or count() aggregation to find the count in each group. But I wish to then create a new column in the original dataset …
    Answered ✓
    Started by Erlebacher
    Most recent by Miguel Angel
    0
    1
    Solution by
    Miguel Angel
    Solution by Miguel Angel

    Hi Erlebacher,

    The Prepare Recipe is primarily used for row operations, though there are some processors which can do operations across rows. In order to do aggregations, the Group or Window Recipes are more appropiate. Both can do counts and other aggregations out of the box. Moreover, you can write your own custom aggregations.

    Regarding performing transformations in the original dataset, this is rather counter-intuitive to the way the Flow is laid out. At times, you can still power through and set the dataset to the output of a recipe point to the same data location as the recipe feed. However, overlapping datasets can potentially cause problems to the Flow's lineage.

421 - 430 of 51743