Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi
I am looking to create a folder in my s3 bucket with current datetime prefix folder name.
can you please help me with the format how to create a datetime folder in s3 using dataiku.
Regards,
Ankur
Difficult to visualize without seeing your job diagnostics to get a clear picture of what you have done so far. Would definitely encourage you to open a support ticket to continue troubleshooting this.
My guess is either that the date/time as a project variable was stored at a different time, e.g few ms, second before.
1) Can you check the exact value of your project variable and see if they're the exact path that exists in S3?
If it does then likely you are running a recursive build and the variable value at the start of the build is the one used for that job, you can try to workaround this by following the steps here :
Thanks for tagging me in your question! I've left the nest to clear up a few things and help lead to a quicker solution!
One way to do this is by creating a variable ${current-yyyy-mm-dd-var} which you can spend append to the existing ${project_key}/${current-yyyy-mm-dd-var} the path of your S3 managed folder. But it's unclear if this is enough for your use case. If you need to create this dataset once or you need to create these every day at subfolders within your existing folder.
Here is a code sample on how to set the yyyy-mm-dd project variable.
import dataiku
from dataiku.scenario import Scenario
from datetime import datetime, timedelta, date
client = dataiku.api_client()
project = client.get_project(dataiku.default_project_key())
scenario = Scenario()
variables = project.get_variables()
current_date = str(date.today())
variables["standard"]["current-yyyy-mm-dd-var"] = current_date
project.set_variables(variables)
Hi @AlexT Thanks for sharing , but i am looking for 2nd option i.e. create these every day at subfolders within your existing folder.
can you please share code for that.
Warm Regards,
Ankur.
Hi @Ankur30 ,
Here is an example Python recipe that would do this.
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import date
managed_folder_id = "xZUM5Sn9"
current_date = str(date.today())
# Read input dataset
my_dataset = dataiku.Dataset("my_dataset_name")
df = my_dataset.get_dataframe()
# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
#upload_data or upload_stream
output_folder.upload_stream("/" + current_date + "/" + "filename.csv", df.to_csv(index=False).encode("utf-8"))
Hi @AlexT ,
Instead of currentdate , i want to use datetime as a path to my S3 bucket so, i changed the below line in above code.
variables["standard"]["current-yyyy-mm-dd-var"] = str(pd.to_datetime('now'))
Although the job is succeeding but while building the output dataset in S3, i am getting error as "Root path doesn't exsist"
Can you help me on this?
Regards,
Ankur.
The str(pd.to_datetime('now')) has a space in it.
e.g 2021-10-21 11:15:10.842966
That could be your issue when trying to check or retrieve the files, you can try looking directly in S3 to see if the files were created.
Can you share the code snippet and exact error from the job logs?
Hi @AlexT ,
I am not getting error in the job logs, it is running fine. I am only getting error in build dataset.
I am able to see the files in S3 bucket.
Regards,
Ankur.
What is your dataset configuration? Blob path? This suggests the datasets can't connect to the specified path configured.
Hi @AlexT ,
Can you please help me where I can see all the above details that you are asking for so that I can share the screenshot.
Perhaps continuing this over a support ticket and sending us the job diagnostics is best course of action here.
Unaware exactly what dataset type is failing to build if this is an S3 dataset or Files from Folder etc.
If it's an S3 Datasets you can find the details :
For Files in Folder Dataset :
Hi @AlexT ,
I cannot build the dataset in dataiku when i pass the current date/time as project variable in path to my S3 bucket. It is working fine rather building the dataset when i pass current_date as project as path to my S3 bucket. That is the issue.
Rest above code you shared is working fine only issue is with date/time variable in my path to S3 bucket.
Difficult to visualize without seeing your job diagnostics to get a clear picture of what you have done so far. Would definitely encourage you to open a support ticket to continue troubleshooting this.
My guess is either that the date/time as a project variable was stored at a different time, e.g few ms, second before.
1) Can you check the exact value of your project variable and see if they're the exact path that exists in S3?
If it does then likely you are running a recursive build and the variable value at the start of the build is the one used for that job, you can try to workaround this by following the steps here :