VISUAL RECIPE

Sujata
Sujata Registered Posts: 2 ✭✭

Hi, Is there a way to exclude the column used to Split the dataset in SPLIT recipe without using prepare/precomputed column

Answers

  • JordanG
    JordanG Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Dataiker

    Hey @Sujata can you provide more information on what you're trying to accomplish? I don't fully understand what you're attempting.

  • Sujata
    Sujata Registered Posts: 2 ✭✭

    Hi @JordanG , I am using SPLIT recipe to split the output into multiple datasets(May be 10 datasets), At the same time I don't want to include the column(that's being used to split the output) in the output dataset, without using PREPARE RECIPE(I need to use 10 prepare recipe here). We have option to select the output columns in the JOIN recipe. Do we have similar option to select the output columns in SPLIT recipe?

  • JordanG
    JordanG Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Dataiker
    edited August 28

    @Sujata Unfortunately, there is no way to do this similar to the join recipe. Does this project need to rerun on a frequent basis or shared with others? If so, it may make sense to spend the time to create a prepare recipe for each resulting table. You can create the prepare recipe once and then use the Copy function to repeat more quickly.

    I probably wouldn't recommend this approach but you can modify the output tables via python code. You will have to do this manually each time or add this code to a scenario step as shown in the screenshot below. This isn't recommended because it will not show up in the flow.

    Screenshot 2025-08-27 at 21.47.55.png
    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    
    client = dataiku.api_client()
    project = client.get_project('project_key')
    recipe = project.get_recipe('split_recipe_name')
    
    for dataset in recipe.get_settings().get_recipe_outputs()['main']['items']:
        df = dataiku.Dataset(dataset['ref']).get_dataframe()    
        df = df.drop(columns=["user"])   
        output = dataiku.Dataset(dataset['ref'])
        output.write_with_schema(df)
    
Setup Info
    Tags
      Help me…