In a flow I want to delete a managed/local dataset which is created somewhere in the middle of the workflow, however I want to delete the dataset only at the end of the flow.
Currently I tried to add a shell script at the end of the flow (as input it has a hdfs dataset tough...). As a a command I tried:
rm -rf /var/app/dss3/managed_datasets/DPP_DIGITAL_AGGREGATES.DM_AGGREGATED_V2/$DKU_DST_load_date (==> load_date is the name of the partitioning).
The script works however if I look at the local directory the files (and partition) are still there.
How can I delete a local dataset? Or why doesn't this 'rm -rf path' doesn't work in the shell script?
You can use a scenario for this purpose, using a "Clear" step
As to why the shell recipe does not remove the file, you should have a look at the logs to see if the file was found. If you're unsure you can use the -v option of rm to make it verbose.