Model metadata for feedback loop

Taylor Registered Posts: 15 ✭✭✭✭

We're looking to capture metadata and predictions from one of our machine learning models so we can set baselines and track progress over time. It's simple enough to capture the predictions and the class probabilities along with a date and timestamp, but is there a way to capture the model type (Random Forest, XGBoost, etc.) as well as some notion of the "version" of the model?

It would be interesting to capture this metadata so we can look back throughout the history of a given model to better understand how changes have made an impact on accuracy and other model metrics.

Here are some of our ideas, but we're hoping for something a bit more dynamic and less process-oriented than this:

  • Hard coded strings in a python recipe before/after the model which correspond to the model's type/version - would require manual update
  • Flow variables that we update whenever we release a new bundle and send it off to our automation node

I'm not sure how the back-end of the ML recipes work, but I'm imagining the scoring recipe somehow having access to the model's metadata. Then you could have configuration options on the scoring recipe to check a box for each piece of metadata you would/wouldn't like attached to the scored dataset along with model predictions and class probabilities.

Apologies if there is a question on the forum I missed or if there is a simple solution we've overlooked!

Note: we're on 5.1.5 right now, but planning our upgrade to 6.0





  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer Posts: 753 Dataiker


    Thanks for the detailed feedback. Being able to capture model metadata while scoring is definitely in our backlog, and we'll be making sure to register your interest for this.

    We don't see much more efficient than your current workarounds, and we completely agree that they're a bit heavy.

Setup Info
      Help me…