Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hej fellow DataIkers,
we just got Dataiku and the first big use case is being planned already. We will be doing a forecast in Dataiku and make the prediction results avaiable to other departments in PowerBI. We want to train models on different aggregation levels (probably by using partitioned models). On the most granular level, there will be so many individual models that it would be infeasible to train each model every day. Which is fine, because usually the users will be interested in the higher level models only.
But in case a user wants to look at a prediction on the most granular level, e.g. a prediction for a particular store/product_group - combination, the respective model has to be trained on demand.
Does anyone have an idea how this could be possible within the PowerBI - DataIku interface? So how a PowerBI user could trigger model training and prediction from within PowerBI either explicitely (e.g. by clicking a button) or implicitely (e.g. by requesting data that is not available because the model has not been trained yet)?
I know that one can use the Dataiku API to start training a particular ML model task - but can you select a particular partition for training?
I know that one can call an API from PowerBI to get data while creating a report, but can that be dynamically refreshed (i.e. by a call to the API) when the user expands a particular part of the report?
Maybe Dataiku can create something that looks like an SQL database to PowerBI but in reality is an interface that queries DataIku to decide whether it sends data immediately or trains and predicts a model firstly?
Or maybe can we put some service in between PowerBI and DataIku to make that scenario work?
I am thankful for any advice or ideas 🙂
Welcome to the Dataiku community!
This is a super interesting question and I love the idea. I might do something like this in the future. However, I’ve not done something like this to date.
When reading your post I was wondering what type of data source you were going to provide Power BI coming out of Dataiku DSS. I was wondering if you were going to build a database with Dataiku and then let the power bi users query the database directly. Or have Power BI replicate the Dataiku DSS data. Or if you were going to have Dataiku deploy a REST API. And have Power BI query that API for its data directly.
If Power BI is querying the database directly, I’m not clear if there is any way to test the freshness of the forecast. However, I could imagine a case where when using a custom REST API end point you could test the freshness of the forecast and then rebuild a new forecast anew if the forecast is not fresh enough. One of the key things that will impact the viability of this approach will have to do with how long these smaller forecasts will take to compute. If the small forecasts can be computed in a few second this approach might work out well.
This all will take a full license to Dataiku. You will need access to Scenarios to build the larger models, Partitions as you noted, The API node to make this api available. Then the “M” power query will hit the Dataiku REST API you have created directly, and do the local presentation of results. You will also need to figure out the security you need on this new API, who can query which forecasts.
I’d reach out to your customer success folks at Dataiku with this challenge. See what resources you can leverage there.
Let us know how you are getting on with your project. Also is there anyone out there who has done a similar project leveraging a Dataiku API, as a backend to a MS Power query presentation. I’d also love to hear more.