Include some feature store functionalities.
Hi, the feature store concept is becoming more and more relevant in the MLOps world. And indeed it would save lot of time because it allows to prevent duplication of code and efforts. It'd be great if one of the next versions of DSS will include this functionality. And, btw, one of the most famous feature store framework, feast, is even open source.
Thanks for the attention. Best Regards.
Giuseppe
Comments
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Thank you for your idea @gnaldi62
!I just wanted to point out a few resources as well that I think will help add to it:
Reuse at Its Best: The Benefits of Feature Stores With Dataiku
-
Hello,
As previsouly mentioned, there are many existing capabilities in Dataiku products that covers the notion of Feature Store.
We do have in our plan to look at additionnal capabilities. In this context, it would be interesting to know what you think is currently missing.
thanks
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
@fsergot
,I invite others to contribute to this conversation. However, over the shorter term. It might be helpful if there were some training materials or knowledge base content out of the Dataiku Academy that talked to how to use, implement the current features that are focused on the idea of a feature store.
—Tom
-
gnaldi62 Partner, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 79 Neuron
Hi, what I am missing from the implementation via DSS functionality is mainly a centralized, governed feature registry and discovery functionality.
According to the useful link mentioned by CoreyS, an ad-hoc connection should be defined for that, but it should maintained, together with related scenarios) in a specifically tailored way. Two different users can define two different overlapping feature stores. Also, in one of the product day someone has presented a feature store for SQL which sounds like a different way to do the same but in a more synthetic way (https://events.dataiku.com/product-days-december-2021/page/1968189/watch-on-demand)Rgds.
Giuseppe
-
CoreyS Dataiker Alumni, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Registered Posts: 1,150 ✭✭✭✭✭✭✭✭✭
Thanks for following up @gnaldi62
. Below is the presentation you are referencing://play.vidyard.com/qTdmbJQexsEqgkLz4KCKBH.html?
This was actually given by @Marlan
who may be able to provide some more context. He does go into some detail here as well. -
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 320 Neuron
Hello, I would agree with @gnaldi62
that feature registry and discovery functionality would be helpful. I'd think at least part of this could be done as general enhancement to DSS functionality rather than being feature store specific.For example, one piece would be the ability to add additional attributes to dataset columns (which in a feature store context could be used to store information like category, added date, version, etc.).
I get the idea of creating a connection that then is used via the catalog to select datasets and import them into a project. Then I guess you would use the dataset explorer tools to see contents. The additional column attributes could be viewed there (via settings, schema probably). This kind of works I guess. Seems like a lot of steps.
Maybe it be easier to provide a recipe that enables display of features (with all the attributes) and then selection of those features that would get added to the source dataset. Maybe the recipe is what's feature store specific.
Still would be nice to have a place to go to do the discovery (separate from the use). We developed a web app that does this pretty nicely but of course the drawback is that it isn't integrated into the step where one actually picks what they are going to use in their project.
These are just ideas... seems like this deserves some careful thought as to how best to approach this.
Marlan
-
Thank you all for your feedbacks.
I can see various improvements in all the answers. There seem to be a common concern on the notion of Feature Group (in Feature Store parlance): the ability to promote some datasets as curated and preferred datasets to use for modelling or enriching raw datasets.
This is something that can already be done by building a specific project in DSS where you process, clean and document the output datasets that becomes your reference source for modelling & enriching.
We do have in our backlog to improve both the ability to identify specific datasets as the preferred ones to use on modelling/enriching but also to improve how shared objects are made accessible to all users across DSS.
There is another parallel work in progress to enrich our knowledge database by compiling all the writings done around the notion of Feature Store.
Do not hesitate to keep on enriching this thread, I will keep on reading and answering as best as I can. In the meantime, I am passing this idea as in our backlog and will keep you posted on progress.
-
We listened to you! We're proud to announce that Dataiku 11 now comes with its feature store.
Checkout this video to get a quick look at it:
https://content.dataiku.com/dataiku-11/feature-store
For more details about it you can read the documentation or the how-to:
https://doc.dataiku.com/dss/latest/mlops/feature-store/index.html
https://knowledge.dataiku.com/latest/kb/collaboration/how-to-feature-store.htmlAlternatively, to discover the feature, you can follow this hands-on tutorial to build your feature store in Dataiku:
https://knowledge.dataiku.com/latest/kb/o16n/feature-store/features-store-overview.html