Partition - Discrete dimension example
mpalangetic
Partner, Registered Posts: 5 Partner
Hello, may you please provide any example where there is partitioning of the dataset based on the discrete values in one column? Everything that I can find on the web is using partitioning based on the time dimension.
Tagged:
Answers
-
You may want to partition a dataset by country for instance, or by customer.
-
I have multiple devices, and I want to partition dataset by Device_ID which is one column in my dataset. But when I go Settings -> Partitioning -> Add discrete dimension, and I put Device_ID in the name field, no partitions exctrated. So my question is how to provide correct pattern in that case?
-
It depends on the type of your dataset : most file-based dataset will partition by folder, while SQL datasets will partition by the value of a certain column. See https://doc.dataiku.com/dss/latest/partitions/index.html for more information.
If you have a file dataset that has a column from which you want to do partitions, you can use a Sync recipe to a new partitioned dataset with the "redispatch partition according to column" option enabled. -
Unfortunately that is not helpful, because you need to sync on the existing dataset to perform that. I want to create my partitioning for the first time. And yes, my dataset is file-based one.
-
That is still the solution, here is a step-by-step guide: https://www.dataiku.com/learn/guide/other/partitioning/partitioning-redispatch.html
Instead of the Year time dimension, add a discrete dimension. Don't forget to insert it in the partitioning pattern. E.g. if the dimension is called "device", the partitioning pattern for the output dataset should look like "%{device}/.*". -
did you get your issue resolved? I think I have the same issue.
-
I'm not sure what issue you're referring to. Did you try using redispatch on a sync recipe, as suggested above?
-
Problem solved. Thank you. It is just because I have to do "redispatch" first before I want to list partitions.
Thanks.