Custom top-N model

Erlebacher
Erlebacher Registered Posts: 82 ✭✭

I am having issues with my custom model. From what I read, models should derive from `scikit-learn` and I should make sure I have a `classes_` attribute in my model. However, my model is a top-N recommender, and I am using `rankfm`, a factorization machine model available in Python/C on github. I have included all the relevant modules in my environment. The choices I have are related to clustering, classification, and regression, but I am doing none of these. How do I create my custom model so that it can be integrated into a Dataiku workflow? I want my prediction function to output a top-N recommendation in a DataFrame. I have gone through various tutorials, but found none that address my use case. Any help is appreciated. Thanks.

Gordon


Operating system used: Mac Ventura

Tagged:

Best Answer

Answers

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker

    Hi,

    Recommender systems are a specific class of ML problems that indeed do not fit in the usual classification/regression/clustering canvas. The simplest alternative would be to write your code directly either in a Python recipe or in a notebook.

    Best,

    Harizo

  • Erlebacher
    Erlebacher Registered Posts: 82 ✭✭

    I have been able to run my algorithm inside a multiclass custom code. However, my question is this: how to I hook up an output dataset to the custom code? More specifically, I wish to write a predict() function with my own output format. I want to be able to avail myself of Dataiku's evaluation tools and have my prediction flow downstream in the flow. Thanks.

  • Erlebacher
    Erlebacher Registered Posts: 82 ✭✭

    Hi @HarizoR
    . I have written my code in a recipe and notebook. The reason I tried the custom model is that my model has a fit() and predict() stage. The fit stage reads training data, and the predict stage reads in a validation file and produces a new data frame. How do I combine fitting and prediction stages when I use my own recipes? Are there any resources on that?

    Finally, is it possible to use the various tools of Dataiku (AUC, precision, etc.) if I use my full recipes, bypassing the custom model? I have not yet come across an existing model of that type. Any hints would be appreciated.

  • HarizoR
    HarizoR Dataiker, Alpha Tester, Registered Posts: 138 Dataiker

    Hi,

    Custom models are usually a good fit when the architecture of your code only varies slightly compared to a standard scikit-learn estimator. In your case you need more flexibility hence the "custom" alternative using Python recipes/notebooks.

    Since Dataiku 11, you can train custom complex models entirely using code and leverage the built-in model result visualizations by leveraging the new experiment tracking and MLFlow model import features. By turning your code into something MLFlow-compatible (see example here) you should be able to retrieve the insights you are looking for.

    Best,

    Harizo

  • Erlebacher
    Erlebacher Registered Posts: 82 ✭✭

    Thanks. That is how I stayed. But I could not figure out how to save the model after training, so I could later generate a score. Has that been addressed? People told me earlier that I needed a custom algorithm. Thanks.

  • Erlebacher
    Erlebacher Registered Posts: 82 ✭✭
    edited July 17

    Hi @HarizoR
    ,

    Your solution makes sense. In the docs, I read:

    This section focuses on the deployment through the API. It assumes that you already have a MLflow model in a model_directory, i.e. a local folder on the local filesystem, or a Managed Folder.

    So I am back to my original problem. How do I get an MLflow model into a model directory. My original question had been how to save a model, and nobody could answer me at the time, although I was already using Dataiku 11.

    Is MLFlow (experimental) available on the free version of Dataiku? Thanks.

  • Erlebacher
    Erlebacher Registered Posts: 82 ✭✭

    I think I understand. I train the model on my laptop, save it with pickle, and use the MLFlow API (assuming it is available to my in the free version of Dataiku 11.x) to uploaded it to Dataiku. If so, it makes sense.
    I noticed your use of cloudpickle (which I had not heard about).

    Cheers,

    Gordon

Setup Info
    Tags
      Help me…