Running python recipe on gcp cloud storage folder that is emtpy

Mario_Burbano
Level 2
Running python recipe on gcp cloud storage folder that is emtpy

Hi, 

I have a python recipe that points to a folder on a GCP cloud storage connection. Overall the recipe works well, however when the folder is empty, because I clean it periodically, the recipe fails at the validation step: 

 Validation failed: Failed to compute recipe status: Folder doesn't exist

I have checked on the GCP console and I can confirm that the folder was indeed removed when what I wanted was to simply purge the files that were inside the folder. The action that I performed was a "Clear" on the folder.

I would like to be able to run this recipe without this type of error, either by being able to clear the contents of the folder without it being deleted or by being able to skip the validation made by DSS before the recipe is run. I tried overrunning the params.skipPrerunValidate recipe variable and setting it to true, but this does not seem to work. Does anyone have any other ideas?

Cheers,

0 Kudos
3 Replies
SarinaS
Dataiker

Hi @Mario_Burbano ,

I wanted to confirm if you are writing to a managed folder that points to GCS?  If so, it is the case that clicking “Clear” on a managed folder via the UI will clear the contents of the managed folder, including the “prefix” itself.  

 

However, I would expect the prefix to get re-created when you run your recipe.  If you are able to reproduce the error, would you mind attaching the following afterwards? 

  • your Python code 
  • a screenshot of the error 
  • a screenshot of your folder settings (like below) 
  • a screenshot of the Partitioning settings for the managed folder 

Screen Shot 2021-03-17 at 12.23.33 PM.png

Then we can see what options might be available to avoid the issue you are facing. 

Thanks,

Sarina 

 

0 Kudos
Mario_Burbano
Level 2
Author

Hello @SarinaS

 

Thanks for your reply. Please find attached the various elements that you requested. The source code is just something like this:

folder = dataiku.Folder("FOLDERID")
 
Regards,
 
 
0 Kudos
SarinaS
Dataiker

Hi @Mario_Burbano ,

Thank you for attaching this information!  I haven't quite been able to reproduce the scenario that you seem to run into here.  Would you mind opening a support ticket and attaching a job diagnostic of the job run that fails? You can get the diagnostic from the job page, by clicking on Actions > Download job diagnosis. I think that will make this easier to look into.   

Thanks,

Sarina

0 Kudos