Best practice for setting up QA, stage, and production data-pipelines

Options
Fahraynk
Fahraynk Registered Posts: 3 ✭✭✭

Hello all,

I've been building a large data-pipeline, and the project is starting to get messy, as I have been creating new branches as I develop new versions of the pipeline. So I want to ask: what are the best practices for separating projects into development, staging, and production?

Should I separate development, staging, and production as separate branches in the same project, create separate projects, or should I use separate DSS servers?

Also if separating into distinct projects, should I use a shared data source for each, or should I re-import the input data source for each?


Operating system used: pop-os

Tagged:

Answers

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Hi @Fahraynk
    ,

    Our approach is to utilize separate DSS instances for development, testing / staging, and production. Our processes reside within one project. We develop in that project on the development DSS instance and then deploy it to our test and ultimately our production instance when ready.

    Some of our data source connections are set up so that they point to different databases on each DSS instance. This allows us to automatically separate dev and test data from production data. Other connections are set to point to the same data tables across dev, test, and prod.

    We also make use of instance level variables (defined differently on each DSS instance). Another mechanism we use are project variable overrides (i.e., "local variables" on the project variables screen). We set the project variable to the production value and then on the development instance version of the project we override that variable to the development value.

    Our development and test instances are on one server and the production instance is on another server.

    We also run our production projects under a service account.

    This all works quite nicely.

    Hope this is helpful.

    Marlan

  • Fahraynk
    Fahraynk Registered Posts: 3 ✭✭✭
    Options

    Thank you. How do you go about deploying development on production? Is it an easy export/upload or are you manually copying the files over?

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    Hi @Fahraynk
    ,

    We use the bundle functionality as described here: https://doc.dataiku.com/dss/latest/deployment/index.html

    That documentation refers to the Project Deployer which we haven't got set up yet. We are manually deploying bundles. Still this takes just a couple of minutes to deploy a project to another instance.

    It all works quite nicely.

    Marlan

Setup Info
    Tags
      Help me…