Request for Guidance on Adding a Column Name to Dataset Path for Versioning System Implementation
Hello Everyone,
I am currently working on implementing a data versioning system for my dataset. One approach I am considering involves appending a timestamp to each version and organizing the data into separate directories.
To accomplish this, I am planning to incorporate a version column into the dataset and use it to modify the s3 file path based on the associated timestamp. My inquiry pertains to the method for adding a column name to the dataset path.
Operating system used: macos
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi Suhail,
One possible solution here may be to use partitioning. If you select all available in the partition and sync to a partition output dataset this will then create a full copy of your dataset within that partitions path.
https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html
If you just want to change the path to your S3 dataset and control this yourself, you could use a project variable within the path itself.