Automate project deployments to Prod Instance

Hello all,

I’m exploring ways to optimize and automate processes within Dataiku and have a few questions:

  1. Is there a way to automate the deployment of projects to Ops (operational environments) within Dataiku? If so, what are the best practices or tools/plugins to achieve this?
  2. What are some recommended approaches for ensuring the quality of Python recipes in projects? Specifically, I’m looking for tools or strategies to:
    • Detect hardcoded passwords.
    • Identify bottlenecks in performance.
    • Spot bad practices or code smells.

I need to deploy lots of projects and I spend hours just reviewing the python recipes, the libraries, etc.

Many thanks in advice.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,171 Neuron
    1. You will most likely need to use the Dataiku Python API. You can automate most things that you can do in the GUI. The level of automation will depend on what deployment tools you want to integrate with, how much you want to invest and your own requirements. Dataiku bundle deployments can have pre and post hooks to execute custom Python code to do anything you want at the appropriate stage. I will suggest you start by doing a full end to end manual deployment and then target the parts of the process you want to automate.
    2. Hardcoded passwords and bad practices in Python code is not some specific to Dataiku so you should look on the web as there is plenty written about this. The Dataiku recipe code can be obtained via the Dataiku Python API after which you can use standard Python parser techniques to validate the code with whatever standards you want to use. With regards to performance I would start analysing your scenarios and jobs runs and identify long running jobs. Then go from there and see what’s worth optimising. I wouldn’t focus on bottlenecks but on jobs that are time sensitive and can add value to the business if they are optimising. Bottlenecks aren’t necessarily bad. Using 100% of the CPU, network or disk I/O can be seen as a bottleneck but can also be seen as fully utilising the available resources to the maximum possible.

  • Gonzalo
    Gonzalo Registered Posts: 2

    Hello Turribeach,

    Many thanks for the quick reply. I just read that there is something called Deployment hooks that can be use to automate things. Could I use that deployment hooks for example to read all the python recipes and rate the quality?

    If you have any doc or tutorial would be much appreciated. I tried to find doc about the deployment hooks and some hands on but not find anything helpfull.

    Thanks and Best regards,

    Gonzalo

Setup Info
    Tags
      Help me…