We recently had one of our Cloud VMs shutdown unexpectedly due to the extreme heat causing a cooling failure and triggering VM shut downs to prevent data center overheating (see here and here). We had some scenarios that were running at the time and obviously did not complete successfully. Dataiku detected this situation on reboot and marked the scenario as "Unterminated run, aborted when started DSS". However the scenario did not trigger any notification steps on failure nor did it execute any mail reporters which were set to run on outcome != 'SUCCESS'. It's clear DSS is able to detect unterminated scenario runs since it marks the scenario run as Aborted with an exclamation mark. Support has confirmed DSS only "cleans up the runtime database".

This Idea is to improve DSS so that it executes any matching scenario mail reporters (ie those where the outcome != 'SUCCESS') as soon as it detects there was a unterminated scenario run. This will enhance the robustness of DSS and it will remove the need to have external monitoring done outside DSS to make sure we can catch these sort of issues.


    Agreed, also allowing us to detect long running scenario runs that havent terminated after a set period of time. if we can set this server side it would be great.

    Thanks all for the feedback. We hear your feedback on this as well as aborting long running scenarios and will let you know of any updates.



