code env management - best practices?
I was wondering if Dataiku or organizations utilizing Dataiku have any best practice recommendations regarding code environment management.
In our organization, every user has the ability to create code envs, referred to as "code envs" for brevity. This has led to a proliferation of hundreds of code envs, and the numbers continue to grow over time. The count of deprecated code envs keeps rising as new Python/R libraries are introduced, as well as due to user turnover (where new users create their own code envs instead of utilizing those of previous users). This situation leads to disk saturation and creates lots of difficulties when rebuilding those code envs during a migration to a new dataiku instance.
Some suggestions have been put forward, such as appointing "code env referees" who would act as the sole administrators of code envs, responsible for consolidating them. However, this approach presents two disadvantages:
- Users would lose flexibility as they would no longer have the freedom to create or modify their own code envs.
- Modifying an existing code envs could become challenging, especially when it is utilized by multiple DSS projects that may introduce incompatible dependencies.
Can anyone provide more effective recommendations for best practices in this context?
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @tanguy
,
Hard to speak to best practices here since this largely depends on each organization. But we did introduce a new feature in DSS 12 that should can help with the management of code envs :
: https://doc.dataiku.com/dss/latest/release_notes/12.html#request-code-env-setup
By allowing users to request code env directly, the Admins can check and decide if perhaps updating an existing code env is adequate instead of creating a new one that the user could use e.g if there is a code env with all of the required packages already.
Thanks,