Hope you are all doing well during this Pandemic !
I have a question in my use case which is, my dataset has more than 70 million rows and as it is taking more run time for a simple prep recipe. We thought of fastening the development time by taking the sample of it and then running it for now. I have to create a sample/filter recipe for that (in order to take a chunk if data, let me know if anything else can be used instead of this). But in the final build and when going further I have to remove the sample/filter recipe and then attach the downstream flow of that sample recipe to the other dataset. Also, can we change the input dataset of a recipe after it is created. if yes, please help me on how we can do it.
Can you also help me with the ways to handle this long and large datasets to quicken the run time process.
In my experience with a much smaller file system and PostgreSQL data sets, I can change both input and output data sets to a recipe.
I do need to be a bit careful about the Schema of the two datasets you are changing are by in large the same.