Queue jobs even if their dependencies are already being built by other jobs
Sometimes, I want to queue up a build even while one of the dependencies of that build is still building. If one of the dependencies I'm currently processing will take 40 minutes to finish, and I know all its dependencies will take at least 20 hours to finish, I'd like to not have to wait 40 minutes before I can queue up the downstream target dataset to refresh. Often times, when I'm satisfied with an end product of a long flow, I want to go back and run a few immediate preceding datasets against the full data so I can validate a completed result, and then after that's finished, I want to refresh the entire flow from the very beginning, which can in many cases take several days to complete. In these cases, I'd like to be able to queue a build without it erroring out if one of its dependencies is already being built. Instead, I'd like that job to simply build up to the dataset that's being built and wait until it and anything that immediately depends on it is finished building, then proceed. In most use-cases, this will never happen, since the main reason to queue a build while one is already running is so it can pick up overnight.
Comments
-
Thanks, this has been logged internally. Just to be clear @natejgardner
, this is the same as https://community.dataiku.com/t5/Product-Ideas/Queue-downstream-execution-if-I-already-started-running-an/idi-p/15652, right, or is there some difference?Katie