Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Scenario & Flow using prior results to modify scope of the current updates

tgb417
Neuron
Neuron
Scenario & Flow using prior results to modify scope of the current updates

I have a local file system based flow that uses shell recipes to find and evaluate files in my file system. The second evaluation process is particularly network and compute expensive.  To go through all of the data is currently taking on the order of 7 hours.  I'm interested in establishing a method to do a incremental update.   I'd prefer not to code this all in a single Python Recipe.

I'd like to use shell scripts to do the file identification portion of the process every day looking for moved and changed files.  Then only do the expensive file evaluation section of the flow for files that have moved or changed.  Theoretically this would save a bunch of time for each run of my process.  

Is there a way to feed a resulting dataset about these files back into the shell script that does the computationally complex part of the process.  That is if I've already done the file evaluation and it has neither moved or changed size or modification date from the last full evaluation.  Just skip it this time.

The challenge I think I'm having is having the end results of a flow fed back into the middle of the flow for the comparison phase.  (This would in effect put a loop into the flow.)  My understanding is that this is not allowed in a DSS flow.  Is that correct?

Any thoughts about this?


Operating system used: Mac OS 10.15.7

--Tom
0 Kudos
0 Replies