Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

Generating Project Identifier for Versioning Training Data

Neuron
Neuron
Generating Project Identifier for Versioning Training Data

Is there an easy way to identify which version of a Project (maybe by git hash?) was used to retrain a particular model? The use case I'm considering is as follows. I would like to version my training data so that each time the training data in the flow is updated and the model is retrained (possibly manually or through a scenario), another scenario will run (probably python), and create a backup of that training dataset in my RDBMS (or on S3) that I can link back to the project at that point in time. Has anyone done something like this before? I was thinking of possibly using a git hash and the date?

 

Thanks,

Tim  

0 Kudos
0 Replies