Partition - Discrete dimension example (along a column )

n0thing233 · ‎07-03-2019

I have a file-based dataset(csv format).I want to partition this dataset based on value a a column (there are 5 values of the column (0,1,2,3,4)

I followed the tutorial but cannot paritiion it.

My column name is 'partition'.And I clicked on "add decrete dimension".Then I fill "partition" in to the box and it generate the pattern "%{partition}/.*"

But then I clicked "list partitions" button. it shows me "

Detected 0 partitions

Found 1 unmatched file:

/out-s0.csv

"

Anyone can help me?

Mattsco · ‎07-03-2019

Hi,

Your dataset is not yet partitioned. You need to rebuild it to see the generated partitions.

To generate this partitioned dataset the parent recipe should be a sync (Configuration tab) or a prepare recipe (Advanced tab) with the redispatch partitioning activated.

Matt

Mattsco

View solution in original post

Mattsco · ‎07-03-2019

Hi,

Your dataset is not yet partitioned. You need to rebuild it to see the generated partitions.

To generate this partitioned dataset the parent recipe should be a sync (Configuration tab) or a prepare recipe (Advanced tab) with the redispatch partitioning activated.

Matt

Mattsco

n0thing233 · ‎07-03-2019

sorry I don't see "redispatch partitioning according to input columns" in my advanced tab, what could be the reason? My seeting told me " No settings required".

Mattsco · ‎07-03-2019

To see it, you need to have the output dataset of the recipe partitioned.
I added a picture on the main answer.

Mattsco

n0thing233 · ‎07-03-2019

Sorry I might ask some stupid questions but the picture you show me is different from my dataiku .
In my sync recipe-> advanced->settings there is no checkbox for "redispatch partitioning according to input columns". I really want to attach my screenshot here. How should I do that?

Mattsco · ‎07-03-2019

Sorry, in the sync it's in the Configuration tab

Mattsco

n0thing233 · ‎07-03-2019

In the sync-> configuration tab -> settings .I have only two options:"Free output schema(name-based matching)" and "Maintain strict schema equality".There is still no "redispatch partitioning according to input columns"

Mattsco · ‎07-03-2019

In the sync-> configuration tab -> output, can you confirm the dataset is partitionned by something?
you should see that mentioned below the name of the dataset.

Mattsco

n0thing233 · ‎07-03-2019

Thanks. I finally made some progress. I'm able to do the partition.

Valengo · ‎01-20-2021

Hello @n0thing233, I'm in the same situation than you, and I'm really interrested about how did you do the partition in the sync recipe. Cound you please explain the steps to follow please ?

Thank you very much in advance.

Partition - Discrete dimension example (along a column )

Partition - Discrete dimension example (along a column )

Labels

Partitioning

Sign up to take part

Partition - Discrete dimension example (along a column )

Partition - Discrete dimension example (along a column )

Labels

Partitioning