I am trying to use Dataïku to evaluate an implementation of a recommender system, but it is hard to represent this RS in a standard ML problem.
To give you the scale of the data, I have about 6M transactions of 4M users on 200K products. Products have features, some of them come from text analysis so I could have a few thousand columns. I am trying to use a content-based recommender, therefore in terms of machine learning my problem can be expressed as follows:
product features x user -> purchase or not
Here is the issue : it is not efficient to represent it that way. Features require thousands of columns. The combination of products and users requires 4Mx200K rows, which is 800 billions, so that's not really possible.
I created a program that takes advantage of the sparsity of this matrix, and outputs recommendations from a trained model, similar to a cosine similarity RS. However, as far as I know, analyzing an algorithm of machine learning in DSS using a plugin requires it to be formatted in the really unpractical way that I exposed earlier.
The Dataiku interface is very relevant to evaluate my solution. I intend to use various precision metrics and avoiding to implement again all of that would save me a lot of work.
Is there a way for me to use the Dataiku interface anyway?