Dataiku DSS 7 AMA is live! Learn more

Data Preparation Split

Level 1
Data Preparation Split
Hi Team,

I have dataset with 10,000 rows and dataset has month column range between JAN till DEC. I want to split this by Month.

I can do this by Visual Recipe "SPLIT" but I have to create 12 different dataset for this.

My question, if I want the DSS to create Dataset according to distinct values of month column then how can I do this ?



Regards,

Nantha.
0 Kudos
5 Replies
Dataiker
Dataiker
Hi,

This is a good use case for partitioning: https://doc.dataiku.com/dss/latest/partitions/index.html

Instead of creating 12 datasets using a split recipe, you can use the sync recipe with a partitioned output dataset. Then in the Settings > Connection menu of the output dataset, configure the partitioning column containing your month. Try discrete partition type if your months are encoded like "1" to "12" or time range partition type if they are encoded like dates ("YYYY-MM").

In the settings of your sync recipe, make sure you click on "Redispatch partitioning according to input columns". Then you will be able to build your selected partitions.
0 Kudos
Dataiker
Dataiker
Hi, I suggest you try this tutorial: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html
0 Kudos
Level 1
I have a similar need, but I'd like to partition on the value of a text column. I have 10000s records and ~50 record categories that I'd like to use as partitions. In the field to add a partitioning pattern, inside dataset settings, I don't see a way to look at a single column or break things up by discrete text values. Can you offer some advice?
0 Kudos
Dataiker
Dataiker
Hi, In order for that option to appear on a Sync recipe, you first need to partition your output dataset by the dimension you want.
0 Kudos
Level 3
I don't see "Redispatch partitioning according to input columns" in dss 5.1 .Any update on this?
0 Kudos