I am working on a model flow and looking to add in some automated evaluation and retraining. I've built a functional flow,. but I feel like it could be slicker, does anyone have any tips on how I could improve it?
My flow looks like this:
Steps are as follows:
I want to predict every day, but I don't want to train my model every day.
In this current flow I need to load in all my Training data in order to split it out and evaluate my model, is this normal? In this way, if I wanted to evaluate my model every day I would have to load in and split my full training dataset, again, is this normal? 🙂
Thanks for your help,
The logic of the flow fits the use case you describe well.
The challenge you are describing with the time to load data is linked to your specific GCS/BigQuery setup. I see that you have opened another discussion on this specific issue.
I believe that once you solve this upstream problem, then the rest of the logic of your flow is good to go.
Hope it helps,
Thanks Alex, yes indeed, improving the speed does make it much manageable to evaluate with a higher frequency - thanks for giving the flow a look over, it's my first evaluation loop, so good to know I'm on the right track!
By the way, you may also achieve some processing speed-up by pushing down the computation to BigQuery (BQ). The Prepare recipe can convert most processors to BQ as long as input and output are both in BQ.
The Split recipe cannot push directly to BQ as of today. Instead, you could use two Filter recipes, which can push to BQ.
That should provide a nice speed-up!