Community Conundrum 25:Feature Visualization is now live! Read More

Partitioning Database: why does DSS replace values of partitioned field causing information loss?

Level 2
Partitioning Database: why does DSS replace values of partitioned field causing information loss?

Hi,

I am working with a relatively big file (250M). I need to sync it from file toward Postgres database, defining year partitions on the output dataset (Postgres).

I am encountering the following problem:

in the source file I have only one datetime field and I want to partition the output dataset based on year intervals of the only datetime field available. The problem is that in the output dataset, DSS replaces the value of the datetime field with the value of the partitions (year), hence, causing me to lose all other information (i.e. month, day, time).

Are you aware of any solution which does not involve duplicating the column in the data source?

 

Thank you.

0 Kudos
2 Replies
Dataiker
Dataiker

Hi @Seymour93 

Instead of using a sync recipe, perhaps you could use a prepare recipe. Inside the prepare recipe you could parse (if not already done) your datetime field and extract year from it. Then in your output dataset use this new field as partition instead of the datetime.

(you might need to use the Redispatch option in the prepare recipe)

I hope this helps!

0 Kudos
Community Manager
Community Manager

@Seymour93 you may also find this helpful as well as a resource on Partitioning: Partitioning in Dataiku DSS - Watch on Demand

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question.
0 Kudos