Having same dataset across projects as output for more than one receipe

Solved!
Chandra_Mouli_R
Level 2
Having same dataset across projects as output for more than one receipe

Hi All,

I have been working in dataiku for more than year now, have some query please read through below and give your solution.

What I Know:

There is an Exposed objects/ Share option to use the dataset across projects. There is a quite lot of limitation, main thing is we cannot write the output to shared dataset in Project B.

What I dont know:

Is it possible to write or save output rows of more than one flow zone in to a same dataset, without any work arounds? [Because i have workaround, because of the way dataset is getting built]

Is it possible to write or save output rows of more than one project in to a same dataset?, currently we can share a dataset across project for using it as an input in Project B, but can we use at as an output dataset?

Thanks,
Chandra Mouli R


Operating system used: Windows

0 Kudos
1 Solution
Manuel
Dataiker Alumni

Hi,

No problem. You have two options:

  • You either do the same as you did in Project A: create the recipe with a new output dataset, but edit the settings after, so that they match;
  • Or, in Project B, before creating the recipe, do +dataset and add your shared dataset manually, with the same settings. Then when creating the recipe you should see the shared dataset already as existing (I have not tested this).

I hope this helps.

Best regards

View solution in original post

6 Replies
Manuel
Dataiker Alumni

Hi,

Did you try the append option? You can find this in the input/output settings of a recipe. See attached image.

I hope this helps.

 

 

0 Kudos
Chandra_Mouli_R
Level 2
Author

Hi @Manuel 
It wont help, across different projects, or even more than one receipe. we are not able to use same dataset for output in more than one project or receipe.
More over, append is for inserting the new records with old records without replacing it.
Thanks for the try!

0 Kudos
Manuel
Dataiker Alumni

Hi,

I assumed you wanted projects to collaborate on a dataset (appending), but instead it seems you want projects to compete (overwrite). 

Is your challenge simply about defining an existing dataset as output? If that is the case, this is also possible:

  • Datasets are not stored in DSS, but in the underlying data platforms (db tables, blob storage, files);
  • The dataset icon in the flow is a โ€œpointerโ€ to a dataset stored somewhere;
  • You can edit two datasets two make sure they point to the same underlying table;

See my examples attached, I have two dummy projects overwriting the same dataset:

  • By default, DSS prefixes the underlying table with the project key (making the table unique for that project);
  • By editing the dataset settings and removing the project key from the table names, you have two projects writing to the same dataset;

I hope this helps.

0 Kudos
Chandra_Mouli_R
Level 2
Author

Hi @Manuel 

I will explain it step by step where is the problem while implementing your solution, please correct!

In Project A, I tried using a recipe on a dataset to create a output "Shared_dataset"

In this process, I am getting a window, asking for New dataset name I gave "Shared_dataset" and place where it will get stored is selected by default. Once click Create Receipe, window moves to Receipe details.

After filling out Receipe details, there is no setting for the ouput dataset before creation of receipe, after receipe is created we come to the flow zone and click Explore on "Shared_dataset" [output of receipe] and go to settings, the same window resembling your screenshot appears.

Now, the table name by default has $project key prefix to "$project_Shared_dataset", now i have removed it and saved the dataset settings. As you said in your solution.

In project B, Creating a receipe asks for an output, when i click on "Existing Dataset", Shared_dataset is not seen in there.

I tried exposing the dataset, means shared it across the projects, then shared_dataset appeared as black color in in Project B. Even after that it is not coming up in list of available datasets.

Thanks,
Chandra Mouli R

 

0 Kudos
Manuel
Dataiker Alumni

Hi,

No problem. You have two options:

  • You either do the same as you did in Project A: create the recipe with a new output dataset, but edit the settings after, so that they match;
  • Or, in Project B, before creating the recipe, do +dataset and add your shared dataset manually, with the same settings. Then when creating the recipe you should see the shared dataset already as existing (I have not tested this).

I hope this helps.

Best regards

Chandra_Mouli_R
Level 2
Author

Hi @Manuel ,

Option 1 worked perfectly ๐Ÿ™‚

Thanks,

Chandra Mouli R

 

0 Kudos