How to initialize empty dataset?

erbinlim
Level 2
How to initialize empty dataset?

Hello,

 

In a simplified example I have a flow with two pipelines. On the first pipeline, the final output is a DatasetA. On the second pipeline, the first input is DatasetA. They both link to the same path on the filesystem. The flow is more complicated than this hence why I cannot simply link the two pipelines linearlly.

My problem comes in when I try to duplicate the project. When the project is duplicated, DatasetA is empty and hence shows the error "Root path does not exist". And the flow refuses to run. I have to manually run the first pipeline so that there is at least some data there before the second pipeline can be populated.

Is there a good way to solve this? Either by initializing an empty dataset or something along those lines? What's the best practice for this? 

Thank you.

0 Kudos
1 Reply
Makoto
Dataiker
Dataiker

I came up with two options for you.

 

1.

If the reason why you don't like to link two subflows linearly is because it makes the flow too complicated visually, you can use Flow Zones(available from DSS version 8). 

Here, the final output of the flow in zone 1 is shared to the input of the flow in zone 2. But visually it's separated because of the flow zones.

Screenshot 2020-11-25 at 11.44.23.png

2. 

If you do need to clearly separate two flows, use the scenario to build the first and second flow sequentially. Then, in the dashboard you can place a button to run the scenario if you need to make it easier to build the flow.

 

Screenshot 2020-11-25 at 11.55.34.png

 

Hope this helps,

 

Makoto

0 Kudos
A banner prompting to get Dataiku DSS
Public