New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Identify lines based on partition variable

JBR
Level 1
Identify lines based on partition variable
Hi,

I'm creating datasets based on files in a S3 bucket.

The files in the bucket are in a single folder, but have several name patterns, such as "blue_01012017.csv", "red_02012017.csv", etc.

Using partitioning, I have defined "blue", "red", etc. as a partition variable called "source". This information is not included in the data itself.

What I want to do is either :

- directly split my dataset based on that "source" value

- or include a "source" column in my dataset that would have the appropriate value for each line, based on the file it came from, so I can split it later based on that value.

I can't seem to find a way to do this, can you help?

Thanks a lot in advance,

Julien
0 Kudos
2 Replies
Clément_Stenac
Dataiker
Dataiker

It is indeed not currently possible to retrieve the source partition as a value inside the data.



You can however achieve the split with multiple sync recipes that only select a single input partition using partition dependencies:



0 Kudos
JBR
Level 1
Author
Thanks Clément, it does the trick perfectly !
0 Kudos
A banner prompting to get Dataiku DSS