Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi All,
I have been working in dataiku for more than year now, have some query please read through below and give your solution.
What I Know:
There is an Exposed objects/ Share option to use the dataset across projects. There is a quite lot of limitation, main thing is we cannot write the output to shared dataset in Project B.
What I dont know:
Is it possible to write or save output rows of more than one flow zone in to a same dataset, without any work arounds? [Because i have workaround, because of the way dataset is getting built]
Is it possible to write or save output rows of more than one project in to a same dataset?, currently we can share a dataset across project for using it as an input in Project B, but can we use at as an output dataset?
Thanks,
Chandra Mouli R
Operating system used: Windows
Hi,
No problem. You have two options:
I hope this helps.
Best regards
Hi @Manuel
It wont help, across different projects, or even more than one receipe. we are not able to use same dataset for output in more than one project or receipe.
More over, append is for inserting the new records with old records without replacing it.
Thanks for the try!
Hi,
I assumed you wanted projects to collaborate on a dataset (appending), but instead it seems you want projects to compete (overwrite).
Is your challenge simply about defining an existing dataset as output? If that is the case, this is also possible:
See my examples attached, I have two dummy projects overwriting the same dataset:
I hope this helps.
Hi @Manuel
I will explain it step by step where is the problem while implementing your solution, please correct!
In Project A, I tried using a recipe on a dataset to create a output "Shared_dataset"
In this process, I am getting a window, asking for New dataset name I gave "Shared_dataset" and place where it will get stored is selected by default. Once click Create Receipe, window moves to Receipe details.
After filling out Receipe details, there is no setting for the ouput dataset before creation of receipe, after receipe is created we come to the flow zone and click Explore on "Shared_dataset" [output of receipe] and go to settings, the same window resembling your screenshot appears.
Now, the table name by default has $project key prefix to "$project_Shared_dataset", now i have removed it and saved the dataset settings. As you said in your solution.
In project B, Creating a receipe asks for an output, when i click on "Existing Dataset", Shared_dataset is not seen in there.
I tried exposing the dataset, means shared it across the projects, then shared_dataset appeared as black color in in Project B. Even after that it is not coming up in list of available datasets.
Thanks,
Chandra Mouli R
Hi,
No problem. You have two options:
I hope this helps.
Best regards