Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on May 2, 2023 1:20PM
Likes: 0
Replies: 3
I would like to convert a partitioned table that has several partitioning dimensions into a non partitioned table.
So it seems logic to use the "all available" partition dependency function for each dimensions, as in the screenshot below:
However, this can result in an error as dataiku seems to:
So, as expected, I encounter an error because dataiku does not find certain combinations that do not exist in my data.
For example, as seen in the screenshot above, the pattern "2023/44" (where "2023" is the year and "44" is the week number) does not yet exist (as of the time of writing this post, we are currently in week n°18 in year 2023).
So, isn't there a simple way to collect all currently available partitions (and not all theoritical partitions as, IMO, dataiku works) ?
cc : @ElieA
Hi @tanguy
,
I think the easiest option would be to simply create a brand new input S3 dataset that points to the same location as your partitioned dataset. And then indeed, you can simply leave partitioning disabled on the dataset, and your entire dataset will be read in from S3
Thanks,
Sarina
Hi @SarinaS
,
Thank you for your answer.
Indeed, I resorted to this solution (which I have also used to point to a higher partitioning granularity, e.g. at the year level in the above example).
IMHO, it is not completely satisfactory though, as it breaks the lineage in the flow and decreases the pipeline lisibilty.
I believe it would be nice to offer an "all *existing* available" dependency function (which would parse the existing partitions, as in the solution you propose, and not find all possible combinations between partitioning dimensions, as the "all available" dependency function currently does).