Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
We have a Pyspark recipe built which has ~20 inputs datasets and one outputs where we write our final data. Inputs get created from different scenarios and we serially write each input to Output table via Pyspark recipe mentioned above,
Now when we trigger pyspark recipe for a specific input, it gives an error "Input dataset is not ready" as some others inputs are being loaded even though the dataset is not being used at that point of time for recipe.
E.g. Lets say i have Dataset A and Dataset B as inputs of Pyspark recipe and Dataset C as output. Dataset A is loaded with new data and i want to run recipe to load A to C, however B is still running. Hence the recipe fails B is still not ready, how to avoid this and run A when it gets completed?
Any thoughts would be really helpful.
As DSS cannot infer from your code that all input datasets are not necessary to execute it, it will throw an error when running a recipe with inputs not being ready. You can solve your issue using one of the following approaches:
Hoping this will answer your need.