Data Preparation Split
fornanthu
Registered Posts: 20 ✭✭✭✭
Hi Team,
I have dataset with 10,000 rows and dataset has month column range between JAN till DEC. I want to split this by Month.
I can do this by Visual Recipe "SPLIT" but I have to create 12 different dataset for this.
My question, if I want the DSS to create Dataset according to distinct values of month column then how can I do this ?
Regards,
Nantha.
I have dataset with 10,000 rows and dataset has month column range between JAN till DEC. I want to split this by Month.
I can do this by Visual Recipe "SPLIT" but I have to create 12 different dataset for this.
My question, if I want the DSS to create Dataset according to distinct values of month column then how can I do this ?
Regards,
Nantha.
Tagged:
Answers
-
Hi,
This is a good use case for partitioning: https://doc.dataiku.com/dss/latest/partitions/index.html
Instead of creating 12 datasets using a split recipe, you can use the sync recipe with a partitioned output dataset. Then in the Settings > Connection menu of the output dataset, configure the partitioning column containing your month. Try discrete partition type if your months are encoded like "1" to "12" or time range partition type if they are encoded like dates ("YYYY-MM").
In the settings of your sync recipe, make sure you click on "Redispatch partitioning according to input columns". Then you will be able to build your selected partitions. -
I don't see "Redispatch partitioning according to input columns" in dss 5.1 .Any update on this?
-
Hi, In order for that option to appear on a Sync recipe, you first need to partition your output dataset by the dimension you want.
-
I have a similar need, but I'd like to partition on the value of a text column. I have 10000s records and ~50 record categories that I'd like to use as partitions. In the field to add a partitioning pattern, inside dataset settings, I don't see a way to look at a single column or break things up by discrete text values. Can you offer some advice?
-
Hi, I suggest you try this tutorial: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html