model scoring

Herve · ‎01-13-2021

Is there a way to get the model name & version to appear in the output of the model Evaluate recipe ? (model scoring)

HenriC · ‎01-14-2021

I did not understand that it was the metrics table you wanted to edit with the model name and version. It makes much more sense now.

The first solution for you would then be to use a Python recipe to add the two columns to your dataset. You can place this recipe after the metrics dataset has been created, it would take in input the model and the metrics dataset. By doing so, you can use the Dataiku Python API to get any information you want on the model and the add it to the output dataset.

Once the recipe is created, it can be automated by creating a scenario that will be triggered each time the Evaluate recipe is ran, and that will trigger the Python recipe.

It is kind of a hack to do what you want but it would work.

We know the limitation of this part and we are working on it for DSS 9.0, there should be some improvements in the next releases.

View solution in original post

HenriC · ‎01-13-2021

Hi @Herve !

How would you like the name to be written in the output dataset? In a new column?

There is no way to do it natively in the recipe, but you can create a Python recipe that takes the model and the output dataset in input and that adds any information to the output dataset.

I could provide you with some example if I understand how you would like the data to be represented 🙂

Have a great day,

Henri

Herve · ‎01-13-2021

Yes I'd like the model name to be written in the output dataset as a new column when performing the model evaluation. This way the model_scoring dataset is fairely consistent with the Table view of the result tab in the Visual Analysis - Models; only this model_scoring applies to deployed models.

HenriC · ‎01-13-2021

I see. What could be done is to add a new column named "model_infos" that will write on the first line the model informations, like the name and the version, but it would be only on the first line and so the dataset wouldn't be very consistent.

We could also write the same information in the new column for each row (Each row would have a column telling some informations about the model used) but it would increase the size of our output dataset with repeated information.

Which one do you prefer? Have you thought of another way to fill this new column?

Herve · ‎01-13-2021

Having the model name and version appear on each line of the model scoring output dataset is actually the intent here. Since the results are appended each time the Evaluate model recipe is run, we'd have potentially different model name and version per line.

HenriC · ‎01-14-2021

I did not understand that it was the metrics table you wanted to edit with the model name and version. It makes much more sense now.

The first solution for you would then be to use a Python recipe to add the two columns to your dataset. You can place this recipe after the metrics dataset has been created, it would take in input the model and the metrics dataset. By doing so, you can use the Dataiku Python API to get any information you want on the model and the add it to the output dataset.

Once the recipe is created, it can be automated by creating a scenario that will be triggered each time the Evaluate recipe is ran, and that will trigger the Python recipe.

It is kind of a hack to do what you want but it would work.

We know the limitation of this part and we are working on it for DSS 9.0, there should be some improvements in the next releases.

model scoring

model scoring

Labels

Advanced ML

Sign up to take part

model scoring

model scoring

Labels

Advanced ML