Request for Guidance on Adding a Column Name to Dataset Path for Versioning System Implementation

Suhail
Level 3
Request for Guidance on Adding a Column Name to Dataset Path for Versioning System Implementation

Hello Everyone,

I am currently working on implementing a data versioning system for my dataset. One approach I am considering involves appending a timestamp to each version and organizing the data into separate directories.

To accomplish this, I am planning to incorporate a version column into the dataset and use it to modify the s3 file path based on the associated timestamp. My inquiry pertains to the method for adding a column name to the dataset path.


Operating system used: macos

0 Kudos
1 Reply
AlexT
Dataiker

Hi Suhail,

One possible solution here may be to use partitioning. If you select all available in the partition and sync to a partition output dataset this will then create a full copy of your dataset within that partitions path. 

https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html

If you just want to change the path to your S3 dataset and control this yourself, you could use a project variable within the path itself.

Screen Shot 2023-03-31 at 5.09.35 PM.png

Labels

?

Setup info

?
A banner prompting to get Dataiku