Rebuild behaviour vs excluding part of flow from build all

Options
MRvLuijpen
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron

My question is if and how to exclude the first part of the flow.

(of course without explicitely adding all datasets to be build inside the scenario.)

I have the following situation. Because of company policy, all data source files are automatically removed every night. (Company policy is that all data should be located inside SQL servers and not in external files).

I do have a project flow, which consists of 3 different source files, namely:

  1. Daily updated file (direct file upload)
  2. Weekly updated file (located in folder)
  3. Monthly updated file (located in folder

My idea was to sync all three file to a SQL connection.

I do want to have a scenario that runs daily, to calculate the complete flow (data set Result in attached project flow).
How should/can I setup this to acomplise the following situations:

  1. If no files are available (during weekend) the flow in zone Daily-Calculations should be calculated
  2. During the week, I upload the 'new' daily-file and this should be synced to the daily_file_sql dataset and afterwards the zone Daily-calculations should be calculated.
  3. Once a week, I also upload the 'weekly-file' and this should be synced to the weekly_file_sql dataset and afterwards the zone Daily-calculations should be calculated.
  4. Once a month, all three files are uploaded, sync'ed and afterwards the zone Daily-Calculation should be calculated.

I did experiment with the advanced settings for the rebuild behaviour to set this to "explicit" or "write-protected", but if the files are not present this caused the build-all to fail.

I hope this is clear.

Answers

  • MRvLuijpen
    MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
    Options
    One option of course is to split the flow into 2 different projects instead of into 2 zones
  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi

    the SQLServer datasets right after the sync recipes should be the ones with the rebuild behavior set to explicit, and their rebuild should be triggered by scenarios with triggers "on dataset change" listening on changes in the input folders or uploaded files datasets.

    Or as suggested in the other reply, the first zone moved to a separate project, and the SQLServer datasets after the Sync exposed to the project containing the daily flow

  • MRvLuijpen
    MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron
    Options

    Hello @fchataigner2
    ,

    Thanks for your reply. Will follow up on your suggestion

Setup Info
    Tags
      Help me…