Request for Guidance on Adding a Column Name to Dataset Path for Versioning System Implementation

Suhail · March 2023

Hello Everyone,

I am currently working on implementing a data versioning system for my dataset. One approach I am considering involves appending a timestamp to each version and organizing the data into separate directories.

To accomplish this, I am planning to incorporate a version column into the dataset and use it to modify the s3 file path based on the associated timestamp. My inquiry pertains to the method for adding a column name to the dataset path.

Operating system used: macos

Alexandru · March 2023

Hi Suhail,

One possible solution here may be to use partitioning. If you select all available in the partition and sync to a partition output dataset this will then create a full copy of your dataset within that partitions path.

https://knowledge.dataiku.com/latest/mlops-o16n/partitioning/concept-redispatch.html

If you just want to change the path to your S3 dataset and control this yourself, you could use a project variable within the path itself.

Screen Shot 2023-03-31 at 5.09.35 PM.png

Request for Guidance on Adding a Column Name to Dataset Path for Versioning System Implementation

Answers

Categories

Setup Info

Tags