Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello Everyone,
I am currently working on implementing a data versioning system for my dataset. One approach I am considering involves appending a timestamp to each version and organizing the data into separate directories.
To accomplish this, I am planning to incorporate a version column into the dataset and use it to modify the s3 file path based on the associated timestamp. My inquiry pertains to the method for adding a column name to the dataset path.
Operating system used: macos
Hi Suhail,
One possible solution here may be to use partitioning. If you select all available in the partition and sync to a partition output dataset this will then create a full copy of your dataset within that partitions path.
https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html
If you just want to change the path to your S3 dataset and control this yourself, you could use a project variable within the path itself.