Pre-deployment project evaluation test.
Hello,
I'd like to make a suggestion on a subject on which I still have to do some work in the various IT departments where I've been able to implement the dataiku solution.
A managed macro, available on the store, which would allow users who own a project to run it in order to find the most appropriate way of developping there project in case they have to bundle it and ask for a deployment. (So called 'eligibility test for production' in my last experience.)
I'm sure you're going to tell me that this is too vast and needs to be configured for each individual or according to the platform administrator's policy.
However, a macro that would have fields to fill in on all the standard criteria for good use of dataiku concepts could be envisaged.
What I mean by this is a simple scan to identify the objects in a flow which, for example, use the engine local stream instead of using the base on which the recipe's input/output datasets are available. In other words, to check that this is the best engine suggested.
The same goes for the partitions, making sure that they are consistent with each other in the brief processing stages.
Or again, as we know a good project is built automatically by a scenario. Check that the main build scenario include a reporter in the event of failure or success for the owner. Does the project have data quality safeguards in the build of the project output table ?
We could also look at other standards and conventions, but they're more customised, so I understand the need to integrate these points ourselves.
Ultimately, it would be a matter of having a scan on the proper use of the basic concept of dataiku
I know that elements above are very classic for a first list suggestion of scan, but I know in my side that these criteria are systematically requested by client who started on DSS platform.
Comments
-
Hi @Grixis
,This is a great idea indeed.
As of today, there are already checks that you can perform at deployment time using Custom hooks.
This happens when deploying, so quite late in the project lifecycle. But it is also when we know where the deployment will occur, allowing to have very precise rules.
However, we are thinking of adding some checks that would be runnable on the Design node, so way earlier. You do not know the target infrastructure but you have access to the whole project, making it easier to inspect the project and control its content.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,088 Neuron
Certainly agree that this is a good idea but in general I tend not to request nor support ideas that can be implemented using the comprehensive Dataiku Python API. We do some of the things you are asking for and validate projects for consistency. We are now building a framework that will detect bad practices and inform our users directly via email and in an automated way. We call this the Dataiku Consistency Checker and it covers not only things that are to do with deployment but operational things like constantly failing scenarios, inconsistent objects due to deleted references or incomplete definitions, triggers that are set to fire too often, triggers that fire in a hot minute of the hour (usually around :00), etc. In the future we will also want to include package vulnerabilities on code envs, unused code envs, unused projects, unused datasets and much more. All of these can be detected using the Python API so the posibilities are endless.
The reason I don't think this should be part of the product is because it's very niche and custom for each Dataiku customer. Furthermore this sort of thing appeals to only large installations with lots of users so the majority of Dataiku customers will probably not use a feature like the one you requested. So my advice will be to get your hands dirty with the Python API and develop your idea to suit your needs. You can then use a Plugin, Scenario or WebApp to execute it so there is nothing really stopping you from fulfiling this requirement.
This is perhaps one of the great strengths of Dataiku. When something doesn't work OOTB the way you want you can usually extend Dataiku using APIs/code to achieve exactly what you want.