Partitioning Database: why does DSS replace values of partitioned field causing information loss?
Hi,
I am working with a relatively big file (250M). I need to sync it from file toward Postgres database, defining year partitions on the output dataset (Postgres).
I am encountering the following problem:
in the source file I have only one datetime field and I want to partition the output dataset based on year intervals of the only datetime field available. The problem is that in the output dataset, DSS replaces the value of the datetime field with the value of the partitions (year), hence, causing me to lose all other information (i.e. month, day, time).
Are you aware of any solution which does not involve duplicating the column in the data source?
Thank you.
Answers
-
Hi @Seymour93
Instead of using a sync recipe, perhaps you could use a prepare recipe. Inside the prepare recipe you could parse (if not already done) your datetime field and extract year from it. Then in your output dataset use this new field as partition instead of the datetime.
(you might need to use the Redispatch option in the prepare recipe)
I hope this helps!
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
@Seymour93
you may also find this helpful as well as a resource on Partitioning: Partitioning in Dataiku DSS - Watch on Demand