Request for Guidance on Adding a Column Name to Dataset Path for Versioning System Implementation
I am currently working on implementing a data versioning system for my dataset. One approach I am considering involves appending a timestamp to each version and organizing the data into separate directories.
To accomplish this, I am planning to incorporate a version column into the dataset and use it to modify the s3 file path based on the associated timestamp. My inquiry pertains to the method for adding a column name to the dataset path.
One possible solution here may be to use partitioning. If you select all available in the partition and sync to a partition output dataset this will then create a full copy of your dataset within that partitions path.