Partitioning Database: why does DSS replace values of partitioned field causing information loss?

Seymour93
Level 2
Partitioning Database: why does DSS replace values of partitioned field causing information loss?

Hi,

I am working with a relatively big file (250M). I need to sync it from file toward Postgres database, defining year partitions on the output dataset (Postgres).

I am encountering the following problem:

in the source file I have only one datetime field and I want to partition the output dataset based on year intervals of the only datetime field available. The problem is that in the output dataset, DSS replaces the value of the datetime field with the value of the partitions (year), hence, causing me to lose all other information (i.e. month, day, time).

Are you aware of any solution which does not involve duplicating the column in the data source?

 

Thank you.

0 Kudos
2 Replies
Liev
Dataiker Alumni

Hi @Seymour93 

Instead of using a sync recipe, perhaps you could use a prepare recipe. Inside the prepare recipe you could parse (if not already done) your datetime field and extract year from it. Then in your output dataset use this new field as partition instead of the datetime.

(you might need to use the Redispatch option in the prepare recipe)

I hope this helps!

0 Kudos
CoreyS
Dataiker Alumni

@Seymour93 you may also find this helpful as well as a resource on Partitioning: Partitioning in Dataiku DSS - Watch on Demand

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos