VISUAL RECIPE

Hi, Is there a way to exclude the column used to Split the dataset in SPLIT recipe without using prepare/precomputed column
Answers
-
Hi @JordanG , I am using SPLIT recipe to split the output into multiple datasets(May be 10 datasets), At the same time I don't want to include the column(that's being used to split the output) in the output dataset, without using PREPARE RECIPE(I need to use 10 prepare recipe here). We have option to select the output columns in the JOIN recipe. Do we have similar option to select the output columns in SPLIT recipe?
-
JordanG Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 7 Dataiker
@Sujata Unfortunately, there is no way to do this similar to the join recipe. Does this project need to rerun on a frequent basis or shared with others? If so, it may make sense to spend the time to create a prepare recipe for each resulting table. You can create the prepare recipe once and then use the Copy function to repeat more quickly.
I probably wouldn't recommend this approach but you can modify the output tables via python code. You will have to do this manually each time or add this code to a scenario step as shown in the screenshot below. This isn't recommended because it will not show up in the flow.
import dataiku from dataiku import pandasutils as pdu import pandas as pd client = dataiku.api_client() project = client.get_project('project_key') recipe = project.get_recipe('split_recipe_name') for dataset in recipe.get_settings().get_recipe_outputs()['main']['items']: df = dataiku.Dataset(dataset['ref']).get_dataframe() df = df.drop(columns=["user"]) output = dataiku.Dataset(dataset['ref']) output.write_with_schema(df)