New to Dataiku DSS? Try out our NEW Quick Start Programs today and get onboarded on the product in just one hour! Let's go

Governance: Who do you allow to push Designer projects to Automation nodes?

Solved!
Taylor
Level 2
Governance: Who do you allow to push Designer projects to Automation nodes?

This is somewhat of an organizational and governance question more than Dataiku, but maybe the "Unified Deployer" in the new 9.0.x version will help me here. 

Currently, only DSS admins (me) in our org have the access to migrate projects from Design to our Automation QA and then to Automation Prod instances. While this won't scale well into the future, it has worked fine so far for our needs. Some groups of end users are adamant about controlling their own deployments to QA / Prod. Is there a way to grant individuals the ability to move only their own projects to QA / Prod Automation nodes? I don't want to hold these power-users back, but I'm not sure if I should make them admins of the entire ecosystem.

Does anyone else have a similar situation? How do you handle this?

Some options I'm exploring:

  • DSS is licensed by the named-user, not the number of instances. So I'm considering having a shared Design instance, but setting up separate QA/Prod Automation nodes for this team so they can manage their own deployment cycles
  • Maybe this new "Unified Deployer" thing can grant certain groups on DSS the power-user ability to push only their own projects to QA / Prod Automation nodes? (To me, this would be the ideal solution I think)
  • Grant this group admin rights in our main QA / Prod Automation and hope for the best?

 

Side note: I refuse to use that deployment macro that has been floating around out there, specifically because of all the "exceptions" for when you can't use it or when it may result in undefined behavior.

1 Solution
fsergot
Dataiker
Dataiker

Hello,

I would say there are several axis to your topic so maybe not a single way to answer all of them. Here are some thoughts for you:

How do you ensure that a deployment will work, not cause broader issues and ensure that only the right person can do the right thing on the right project?
In DSS, you already have some good features to put in place to make sure projects are tested (think scenarios, metrics & checks or custom code to test your own code ..). As a project is also packaged in a bundle, you are covered on consistency as you are sure to take all you need in one click.

This is a topic on which the new Deployer will definitely help. So you can give it a try. It adds clear security on who can deploy what on what infrastructure but also a way for central teams to see and monitor all projects. The main point of attentions remains on shared objects and configurations (like spark, plugins or connections).

How to you ensure the update is in line with your rules regarding compliance, fairness, explainability?
This is an important topic when dealing with ML projects and potentially critical depending on your company rules and potential regulation. Delegating such tasks is complex and they may represent a fair amount of necessary work in your operationalization process
DSS embeds tools to help you go faster and smarter on that, you can check some hints on Top 3 Dataiku Features for Transparent & Explainable AI. The Model fairness report & Model Document Generator are a great start.

How do you ensure that the right process has been followed for moving to production?
DSS can help you ensure an adequate level consistency in your project, but this broader governance topic needs to be clearly defined internally and communicated to people, that's probably the first task.
You can embed such rules in DSS to some extent using the new Deployer but also using the scenarios (e.g. you can automate deployments of bundles or API services with scenario steps and mix that with other steps, even custom ones to add your own logic).

A few additional thoughts on that:

  • A alternative approach could be to automate the whole operationalization flow using a CI/CD tool like Jenkins (see Building a Jenkins pipeline for Dataiku DSS ). You can then offer the user to 'just' trigger the pipeline so that you are 100% sure all the rules are enforced on such project (however, do not underestimate the workload to build and maintain such pipelines..)
  • You may want to consider applying different rules to different project and they do not require the same level of skills or regulation. Let the 'lighter' projects have a lighter process and focus on the most complex/critical ones. Here also the new Deployer can help as you can set different rights per project.

Hope this helps 🙂

View solution in original post

5 Replies
Marlan
Neuron
Neuron

Hi @Taylor,

Good question. Everyone on my team (albeit a fairly small one) does their own deployments. As we add teams to the platform, we are considering adding additional automation instances for each team to use. Currently, we are thinking that we'd continue with each person doing their own deployments. 

What is your concern about expanding access to who is able to deploy projects? I'm more just curious. Limiting who can do deployments seems like an eminently reasonable approach and we did discuss doing that a while back but ended up deciding not to do it that way.  

Marlan

Taylor
Level 2
Author
Yea that's a totally fair question, as I'm kind of rethinking my approach to all of this as well. I'd say I'm in this mindset mostly due to the relatively traditional / strict culture of governance, change-control, and deployment schedule in the IT department where I work. I'm definitely starting to lean towards enabling more "power-user" type people to do their own deployments if they prefer, especially since that would scale much better.
There's also the concern of data sensitivity. We're in a highly regulated industry with a lot of audit and data security regulation around who can see what.
I am kind of liking the idea of setting up an additional Automation node so I can really set this other team free though.

Have you tested out the Unified Deployer thing that is new in DSS 9.0.x yet? I think that's next on my list to test out. If that behaves how I'm hoping, then this issue might even already be solved. We're still on 8.0.3 for all of our instances.
Marlan
Neuron
Neuron

No, I haven't tried the v9 deployer yet. It sounds like it will give us some additional options on this front which is great.

Coincidentally we are working on a new process to deploy new tables to the production tier of our database platform. We are planning to use scenarios to execute create table scripts and run these under a new service account set up for this purpose. Previous table deployment happened outside of DSS; we are bringing into DSS to simplify the process and align table deployments with project deployments. Currently we are planning for individual to do their own table deployments as part of the project deployments.   

Marlan

0 Kudos
fsergot
Dataiker
Dataiker

Hello,

I would say there are several axis to your topic so maybe not a single way to answer all of them. Here are some thoughts for you:

How do you ensure that a deployment will work, not cause broader issues and ensure that only the right person can do the right thing on the right project?
In DSS, you already have some good features to put in place to make sure projects are tested (think scenarios, metrics & checks or custom code to test your own code ..). As a project is also packaged in a bundle, you are covered on consistency as you are sure to take all you need in one click.

This is a topic on which the new Deployer will definitely help. So you can give it a try. It adds clear security on who can deploy what on what infrastructure but also a way for central teams to see and monitor all projects. The main point of attentions remains on shared objects and configurations (like spark, plugins or connections).

How to you ensure the update is in line with your rules regarding compliance, fairness, explainability?
This is an important topic when dealing with ML projects and potentially critical depending on your company rules and potential regulation. Delegating such tasks is complex and they may represent a fair amount of necessary work in your operationalization process
DSS embeds tools to help you go faster and smarter on that, you can check some hints on Top 3 Dataiku Features for Transparent & Explainable AI. The Model fairness report & Model Document Generator are a great start.

How do you ensure that the right process has been followed for moving to production?
DSS can help you ensure an adequate level consistency in your project, but this broader governance topic needs to be clearly defined internally and communicated to people, that's probably the first task.
You can embed such rules in DSS to some extent using the new Deployer but also using the scenarios (e.g. you can automate deployments of bundles or API services with scenario steps and mix that with other steps, even custom ones to add your own logic).

A few additional thoughts on that:

  • A alternative approach could be to automate the whole operationalization flow using a CI/CD tool like Jenkins (see Building a Jenkins pipeline for Dataiku DSS ). You can then offer the user to 'just' trigger the pipeline so that you are 100% sure all the rules are enforced on such project (however, do not underestimate the workload to build and maintain such pipelines..)
  • You may want to consider applying different rules to different project and they do not require the same level of skills or regulation. Let the 'lighter' projects have a lighter process and focus on the most complex/critical ones. Here also the new Deployer can help as you can set different rights per project.

Hope this helps 🙂

View solution in original post

Taylor
Level 2
Author
Yes this is definitely very helpful! I appreciate you framing the different topics like this. And I thought I had explored the entire Knowledge hub articles, but the ability to leverage Jenkins for CI/CD is news to me! That is very interesting, will definitely give that some more exploration, thanks!
0 Kudos
Labels (1)
A banner prompting to get Dataiku DSS