Pull dataset efficiency
Hi everyone,
In a scenario that I have a dataset in Snowflake for example and I have already pulled this dataset in a project but I want to use the same dataset in a new project. What is more efficient, pull again the same dataset from Snowflake or use it from the project that already exists?
Thank you
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
It really depends what are you doing with the dataset, what type of recipe you have and where you are storing the output. If the dataset is stored in Snowflake then what you have a SQL pointer to your dataset, not the dataset itself. When you execute the flow the data will used in the flow accordingly. How the data moves will depend on what the next recipe does, where it runs and where the output goes. For instance if you have two visual recipe both using Snowflake as inputs and outputs then there is no advantage in using an existing dataset in another project other than saving whatever time it takes for the recipe that generates that dataset to run. If the time is too long you could share the dataset from one project to another but you will need to make sure the shared dataset refreshes in line with the other project too.
Answers
-
Got it, thank you @Turribeach
!