folder in s3 bucket in format /${project_key}/yyyy-mm-dd-filename.

Solved!
Ankur30
Level 3
folder in s3 bucket in format /${project_key}/yyyy-mm-dd-filename.

Hi 

 

I am looking to create a folder in my s3 bucket with current datetime prefix folder name.

can you please help me with the format how to create a datetime folder in s3 using dataiku.

 

Regards,

Ankur

0 Kudos
1 Solution
AlexT
Dataiker

Difficult to visualize without seeing your job diagnostics to get a clear picture of what you have done so far. Would definitely encourage you to open a support ticket to continue troubleshooting this. 

My guess is either that the date/time as a project variable was stored at a different time, e.g few ms, second before. 

1) Can you check the exact value of your project variable and see if they're the exact path that exists in S3? 

If it does then likely you are running a recursive build and the variable value at the start of the build is the one used for that job,  you can try to workaround this by following the steps here : 

https://community.dataiku.com/t5/General-Discussion/Asynchronous-project-variables-recursive-build/m...

Screenshot 2021-10-21 at 15.38.23.png

 

View solution in original post

0 Kudos
12 Replies
Dataiku
Community Manager
Community Manager

Thanks for tagging me in your question! I've left the nest to clear up a few things and help lead to a quicker solution!

0 Kudos
AlexT
Dataiker

One way to do this is by creating a variable ${current-yyyy-mm-dd-var} which you can spend append to the existing ${project_key}/${current-yyyy-mm-dd-var} the path of your S3 managed folder. But it's unclear if this is enough for your use case. If you need to create this dataset once or you need to create these every day at subfolders within your existing folder. 

Here is a code sample on how to set the yyyy-mm-dd project variable. 

 

import dataiku 
from dataiku.scenario import Scenario 
from datetime import datetime, timedelta, date 

client = dataiku.api_client() 
project = client.get_project(dataiku.default_project_key()) 
scenario = Scenario()

variables = project.get_variables()
current_date = str(date.today()) 
variables["standard"]["current-yyyy-mm-dd-var"] = current_date
project.set_variables(variables) 

 

0 Kudos
Ankur30
Level 3
Author

Hi @AlexT Thanks for sharing , but i am looking for 2nd option i.e.  create these every day at subfolders within your existing folder. 

can you please share code for that.

Warm Regards,

Ankur.

0 Kudos
AlexT
Dataiker

Hi @Ankur30 ,

Here is an example Python recipe that would do this.

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import date

managed_folder_id = "xZUM5Sn9"
current_date = str(date.today()) 


# Read input dataset
my_dataset = dataiku.Dataset("my_dataset_name")
df = my_dataset.get_dataframe()

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
#upload_data or upload_stream
output_folder.upload_stream("/" + current_date + "/" + "filename.csv", df.to_csv(index=False).encode("utf-8"))
0 Kudos
Ankur30
Level 3
Author

Hi @AlexT ,

Instead of currentdate , i want to use datetime as a path to my S3 bucket so, i changed the below line in above code.

variables["standard"]["current-yyyy-mm-dd-var"] = str(pd.to_datetime('now'))

 Although the job is succeeding but while building the output dataset in S3, i am getting error as "Root path doesn't exsist"

Can you help me on this?

 

Regards,

Ankur.

0 Kudos
AlexT
Dataiker

The str(pd.to_datetime('now')) has a space in it.

e.g 2021-10-21 11:15:10.842966

That could be your issue when trying to check or retrieve the files, you can try looking directly in S3 to see if the files were created.

Can you share the code snippet and exact error from the job logs?  

0 Kudos
Ankur30
Level 3
Author

Hi @AlexT ,

I am not getting error in the job logs, it is running fine. I am only getting error in build dataset.

I am able to see the files in S3 bucket.

 

 

Regards,

Ankur.

 

 

 

0 Kudos
AlexT
Dataiker

What is your dataset configuration? Blob path? This suggests the datasets can't connect to the specified path configured. 

0 Kudos
Ankur30
Level 3
Author

Hi @AlexT ,

Can you please help me where I can see all the above details that you are asking for so that I can share the screenshot.

 

0 Kudos
AlexT
Dataiker

Perhaps continuing this over a support ticket and sending us the job diagnostics is best course of action here.

Unaware exactly what dataset type is failing to build if this is an S3 dataset or Files from Folder etc. 

If it's an S3 Datasets you can find the details :

Screenshot 2021-10-21 at 14.36.02.png

For Files in Folder Dataset :

Screenshot 2021-10-21 at 14.37.48.png

0 Kudos
Ankur30
Level 3
Author

Hi @AlexT ,

I cannot build the dataset in dataiku when i pass the current date/time as project variable in path to my S3 bucket. It is working fine rather building the dataset when i pass current_date as project as path to my S3 bucket. That is the issue.

Rest above code you shared is working fine only issue is with date/time variable in my path to S3 bucket.

0 Kudos
AlexT
Dataiker

Difficult to visualize without seeing your job diagnostics to get a clear picture of what you have done so far. Would definitely encourage you to open a support ticket to continue troubleshooting this. 

My guess is either that the date/time as a project variable was stored at a different time, e.g few ms, second before. 

1) Can you check the exact value of your project variable and see if they're the exact path that exists in S3? 

If it does then likely you are running a recursive build and the variable value at the start of the build is the one used for that job,  you can try to workaround this by following the steps here : 

https://community.dataiku.com/t5/General-Discussion/Asynchronous-project-variables-recursive-build/m...

Screenshot 2021-10-21 at 15.38.23.png

 

0 Kudos