Start Flow from specific dataset

Hello,

As dss developper, I would like to be able to run a part of the flow from a dataset I can choose (start point).

Actually, I can run one recipie ("build only this dataset") or restart all the flow from the beginning (it can be long sometimes). In the flow, I can select "Build flow outputs reachable from here". It will be useful to have also "Build flow from here until the last dataset".

The top will be the option "force rebuild dependencies with customizable start point / end point ",

Regards

5 Comments
AshleyW
Dataiker
Dataiker
Status changed to: Needs Info

Hi @Tuong-Vi ,

I'm not sure you meant was the difference between "build flow outputs reachable from here" and "build flow from here until the last dataset". Could you clarify that?

Best,

Ashley

Note: I'll log the 'top option' you mentioned in which you'd like to be able to build a section of the Flow you've selected. Based on previous discussions we've had around this idea, it looks unlikely that we'll implement it, but I've added your idea to the existing group of requests. 

 

Marlan
Neuron
Neuron

Hello @Tuong-Vi  and @AshleyW. I've wanted to be able to do this as well.

I would describe it as building the datasets that are selected if I right click on a dataset and click "Select all downstream". 

Maybe it could be implemented by offering an option to build selected parts of the flow. So one could "Select all downstream" and then "Build selected". 

Note that since I typically work with SQL datasets I always use the force rebuild option.

Also I thought that "build flow outputs reachable from here" might do this (i.e., build downstream datasets) but it seems to want to rebuild more than that. At least when I select force rebuild dependencies. If I select build required dependencies then none the downstream SQL datasets are rebuilt. So it'll rebuild way more than I want it to or nothing at all. I don't use it.

It'd be nice to have the requested option in a Scenario step as well. I can work around this by including a bunch of build only this dataset (with force rebuild). So I can accomplish what I need, it's just more difficult to do than it needs to be. 

Marlan  

 

Tuong-Vi
Neuron
Neuron

Hello,

I'm agree with you @Marlan , and in a Scenario, this option will be useful too. An another option I would like to see in scenario is action for "all datasets". It will avoid to select dataset sequentially for global action like synchronize hive metastore or build metrics.... The best : check/unckeck dataset in the list (dataset to compute)

Sans titre.png

AshleyW
Dataiker
Dataiker
Status changed to: In Backlog

Hi @Tuong-Vi

This idea has been added to our backlog.

 

Tuong-Vi
Neuron
Neuron

Hello, thank you for your attention to this matter,

have a nice day

Public