Can we detach the downstream from the dataset and attach to another dataset ?
Hi Team,
Hope you are all doing well during this Pandemic !
I have a question in my use case which is, my dataset has more than 70 million rows and as it is taking more run time for a simple prep recipe. We thought of fastening the development time by taking the sample of it and then running it for now. I have to create a sample/filter recipe for that (in order to take a chunk if data, let me know if anything else can be used instead of this). But in the final build and when going further I have to remove the sample/filter recipe and then attach the downstream flow of that sample recipe to the other dataset. Also, can we change the input dataset of a recipe after it is created. if yes, please help me on how we can do it.
Can you also help me with the ways to handle this long and large datasets to quicken the run time process.
Regards,
Tejasri.
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
In my experience with a much smaller file system and PostgreSQL data sets, I can change both input and output data sets to a recipe.
I do need to be a bit careful about the Schema of the two datasets you are changing are by in large the same.
Steps:
- Open you recipe
- Choose "Input/Output" at the top
- Choose the "Change" button
- Choose from the list of existing datasets or create a new data set.