Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Include some feature store functionalities.

Hi, the feature store concept is becoming more and more relevant in the MLOps world. And indeed it would save lot of time because it allows to prevent duplication of code and efforts. It'd be great if one of the next versions of DSS will include this functionality. And, btw, one of the most famous feature store framework, feast, is even open source.

Thanks for the attention. Best Regards.

Giuseppe

8 Comments
CoreyS
Community Manager
Community Manager

Thank you for your idea @gnaldi62!

I just wanted to point out a few resources as well that I think will help add to it:

DSS Feature Store 

Reuse at Its Best: The Benefits of Feature Stores With Dataiku

Setting up Your Feature Store With Dataiku 

fsergot
Dataiker
Dataiker
Status changed to: Needs Info

Hello,

As previsouly mentioned, there are many existing capabilities in Dataiku products that covers the notion of Feature Store.

We do have in our plan to look at additionnal capabilities. In this context, it would be interesting to know what you think is currently missing.

thanks

tgb417
Neuron
Neuron

@fsergot ,

I invite others to contribute to this conversation.  However, over the shorter term.  It might be helpful if there were some training materials or knowledge base content out of the Dataiku Academy that talked to how to use, implement the current features that are focused on the idea of a feature store.  

—Tom

gnaldi62
Level 4
Level 4

Hi, what I am missing from the implementation via DSS functionality is mainly a centralized, governed feature registry and discovery functionality.
According to the useful link mentioned by CoreyS, an ad-hoc connection should be defined for that, but it should maintained, together with related scenarios) in a specifically tailored way. Two different users can define two different overlapping feature stores. Also, in one of the product day someone has presented a feature store for SQL which sounds like a different way to do the same but in a more synthetic way (https://events.dataiku.com/product-days-december-2021/page/1968189/watch-on-demand)

Rgds.

Giuseppe

CoreyS
Community Manager
Community Manager

Thanks for following up @gnaldi62. Below is the presentation you are referencing:

 

This was actually given by @Marlan who may be able to provide some more context. He does go into some detail here as well.

Marlan
Neuron
Neuron

Hello, I would agree with @gnaldi62 that feature registry and discovery functionality would be helpful. I'd think at least part of this could be done as general enhancement to DSS functionality rather than being feature store specific.

For example, one piece would be the ability to add additional attributes to dataset columns (which in a feature store context could be used to store information like category, added date, version, etc.).

I get the idea of creating a connection that then is used via the catalog to select datasets and import them into a project. Then I guess you would use the dataset explorer tools to see contents. The additional column attributes could be viewed there (via settings, schema probably). This kind of works I guess. Seems like a lot of steps.

Maybe it be easier to provide a recipe that enables display of features (with all the attributes) and then selection of those features that would get added to the source dataset. Maybe the recipe is what's feature store specific. 

Still would be nice to have a place to go to do the discovery (separate from the use). We developed a web app that does this pretty nicely but of course the drawback is that it isn't integrated into the step where one actually picks what they are going to use in their project.

These are just ideas... seems like this deserves some careful thought as to how best to approach this.

Marlan 

 

fsergot
Dataiker
Dataiker

Thank you all for your feedbacks.

I can see various improvements in all the answers. There seem to be a common concern on the notion of Feature Group (in Feature Store parlance): the ability to promote some datasets as curated and preferred datasets to use for modelling or enriching raw datasets.

This is something that can already be done by building a specific project in DSS where you process, clean and document the output datasets that becomes your reference source for modelling & enriching.

We do have in our backlog to improve both the ability to identify specific datasets as the preferred ones to use on modelling/enriching but also to improve how shared objects are made accessible to all users across DSS.

There is another parallel work in progress to enrich our knowledge database by compiling all the writings done around the notion of Feature Store.

Do not hesitate to keep on enriching this thread, I will keep on reading and answering as best as I can. In the meantime, I am passing this idea as in our backlog and will keep you posted on progress.

fsergot
Dataiker
Dataiker
Status changed to: In Backlog