Survey banner
The Dataiku Community is moving to a new home! We are temporary in read only mode: LEARN MORE

Divide a dataset into two outputs based on a numeric criteria

Solved!
InesB
Dataiker
Divide a dataset into two outputs based on a numeric criteria

A question that is often asked in the Dataiker community by former Alteryx user is to understand how to divide a dataset into two outputs ? For example : one with records where the 'Sales' column value is above 1000, and another with the rest of the records? An action that is easily done with the filter tool in Alteyrx… 

 

0 Kudos
1 Solution
InesB
Dataiker
Author

The answer is easier than you think: if the goal is to divide the datasets in two, the Split Recipe might be what you need. 

In Dataiku, to divide your dataset into two outputs based on the 'Sales' column value, you should use the Split recipe.

In your project, select the dataset you want to split and choose the "Split" recipe from the list of available recipes.Name the output datasets as you want them to be. 

Then in the Split recipe configuration, define the condition for the first output. For instance, set the condition as Sales > 1000 to create the first output with records where the 'Sales' column value is greater than 1000.

For the second output, there is no need of setting another condition <=1000 to capture the rest of the records. The second output will be automatically generating when defining the first condition for Sales >1000.

Once you have set the conditions, run the Split recipe. Dataiku will generate two separate datasets based on your specified conditions.

View solution in original post

1 Reply
InesB
Dataiker
Author

The answer is easier than you think: if the goal is to divide the datasets in two, the Split Recipe might be what you need. 

In Dataiku, to divide your dataset into two outputs based on the 'Sales' column value, you should use the Split recipe.

In your project, select the dataset you want to split and choose the "Split" recipe from the list of available recipes.Name the output datasets as you want them to be. 

Then in the Split recipe configuration, define the condition for the first output. For instance, set the condition as Sales > 1000 to create the first output with records where the 'Sales' column value is greater than 1000.

For the second output, there is no need of setting another condition <=1000 to capture the rest of the records. The second output will be automatically generating when defining the first condition for Sales >1000.

Once you have set the conditions, run the Split recipe. Dataiku will generate two separate datasets based on your specified conditions.

Labels

?
Labels (1)
A banner prompting to get Dataiku