reuse datasets inside project

Options
jlbellier
jlbellier Registered Posts: 22

Hello everybody,

I would like to understand the reuse of datasets, and ho it it managed because it is not very clear to me.
Here is my case : I have a dataset, that is the output of a recipe (let's call it D1). I would like to use D1 in another flow zone of my project, without creating a full copy of it, but a linked copy (like we would do with symbolic links under Unix).

The version I use is : 9.0.1. I suppose it is Enteprise edition (it is my customer's one), but I cannot see this information.

What I did is the following :

1) I right-clicked on the D1 is the source flow zone Z1

2) I selected "Share to a flow zone", and the zone where the link should be copied (let's call it Z2).

After validation, I can see D1 in the soure zone with a lightblue border, and the D1 share (let's call it D1_share) in the new flow zone.

After a few operations, I would like to remove D1_share in Z2 because I notice it is useless in Z2.
So I right-click on D1_share and select "Delete". After validation, DSS warns me that all the recipes having D1 as input or ouput will be removed in all zones where D1 is involved.

So I guess my way of doing is wrong, and this lets me think that D1_share is not a linked copy as I expected, but a pointer to the source.
How can I do a copy that is a linked copy and not a true copy ? When I right-clik on D1 in the source zone and select 'Copy' in 'Other actions', I do not find the way to do this. All the copies are "true copies".

Could you please give me lights on this ?

Thanks you in advance.

Best regards,

Jean-Luc.


Operating system used: Windows

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,740 Neuron
    Options

    So I think you have figured out all your questions. Shared datasets are just pointers to the original dataset. If you delete a shared dataset you delete the shared dataset, the original one, along with the other stuff that will be deleted when you delete a dataset. So don't delete shared datasets if you just want to remove the shared dataset, just right click on them and select "Unshare". Other than that I am not sure what else you may think it's not possible or what functionality you are missing. Just because the delete command in Linux deletes symlinks it doesn't mean Dataiku needs to behave in the same way. Or alternatively if you prefer it you can think of the "delete symlink" in Linux as the "Unshare Dataset" in Dataiku.

Setup Info
    Tags
      Help me…