Managing Configuration Files Across Dataiku Instances for User-Accessible Settings

Solved!
tanguy
Managing Configuration Files Across Dataiku Instances for User-Accessible Settings

We're seeking a solution to efficiently configure settings across Dataiku instances, with a specific focus on settings accessible to users, rather than those at the admin level. Our primary aim is to facilitate changes to these configuration settings without disrupting the production environment or necessitating deployment, in contrast to the use of "global variables" within a project or the use of a config file inside a project library.

For example, within our organization, we have several Dataiku projects that regularly send reports via mailing lists. These mailing lists frequently evolve in response to organizational shifts. Presently, we store these mailing lists in the "Global variables" section, but any adjustments require deployment to take effect in the production environment.

Is there a centralized solution where these configuration changes can be established once and consistently applied across all instances (especially between a Design environment and the "attached" Automation Node(s))?


Operating system used: linux

1 Solution
Turribeach

With regards to email lists I think I will quote Steve Jobs: "you are holding it wrong" โ€Œ๐Ÿคฃโ€Œ

There should be no need to change email lists in code. Create an email distribution, give the business or the relevant people permission to update it and use the SMTP in Dataiku. In Exchange you can even change the Display Name of the email dritribution list and keep the same SMTP email address or even move SMTP email address to a new email distribution. There should be no need to hardcode email distribution lists in Dataiku.

Having said that I am sure your question applies to othe config items not just email distribution lists. We are using Global Variables at the moment but my view is that we should move away from them. On one side you have the hassle of having to redeploy these to all environments every time you make a change. And on the other side you can't have secrets in them since Globals are visible to all users even clickers. On top of that we have a lot of variables which means that these pester the user space when they load variables via get_custom_variables()/get_variables() and they could even clash with user's code as users have no way of knowing what these variables are called. 

So where to store these then? Well in my view it will be great to use a secrets store to also store configuration values. The reason I say this is because I have secrets to store too (like Cloud keys, etc) so might as well put everything in one single place and let the code retrieve the values at run time. This obviously adds a depedency on the secrets store which means you should probably use some very robust, highly available and serverless service, such as the secrets stores offering from the Cloud vendors. If you are too worried about not being able to retrieve the config/secrets during an outage you could implement some sort of caching in a local file but then again you will need to encrypt this file as it may have secrets. 

Hope it helps.

View solution in original post

2 Replies
Turribeach

With regards to email lists I think I will quote Steve Jobs: "you are holding it wrong" โ€Œ๐Ÿคฃโ€Œ

There should be no need to change email lists in code. Create an email distribution, give the business or the relevant people permission to update it and use the SMTP in Dataiku. In Exchange you can even change the Display Name of the email dritribution list and keep the same SMTP email address or even move SMTP email address to a new email distribution. There should be no need to hardcode email distribution lists in Dataiku.

Having said that I am sure your question applies to othe config items not just email distribution lists. We are using Global Variables at the moment but my view is that we should move away from them. On one side you have the hassle of having to redeploy these to all environments every time you make a change. And on the other side you can't have secrets in them since Globals are visible to all users even clickers. On top of that we have a lot of variables which means that these pester the user space when they load variables via get_custom_variables()/get_variables() and they could even clash with user's code as users have no way of knowing what these variables are called. 

So where to store these then? Well in my view it will be great to use a secrets store to also store configuration values. The reason I say this is because I have secrets to store too (like Cloud keys, etc) so might as well put everything in one single place and let the code retrieve the values at run time. This obviously adds a depedency on the secrets store which means you should probably use some very robust, highly available and serverless service, such as the secrets stores offering from the Cloud vendors. If you are too worried about not being able to retrieve the config/secrets during an outage you could implement some sort of caching in a local file but then again you will need to encrypt this file as it may have secrets. 

Hope it helps.

tanguy
Author

I notice that this request is closely related to the proposal of this product idea.

0 Kudos