folder in s3 bucket in format /${project_key}/yyyy-mm-dd-filename.
Hi
I am looking to create a folder in my s3 bucket with current datetime prefix folder name.
can you please help me with the format how to create a datetime folder in s3 using dataiku.
Regards,
Ankur
Best Answer
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Difficult to visualize without seeing your job diagnostics to get a clear picture of what you have done so far. Would definitely encourage you to open a support ticket to continue troubleshooting this.
My guess is either that the date/time as a project variable was stored at a different time, e.g few ms, second before.
1) Can you check the exact value of your project variable and see if they're the exact path that exists in S3?
If it does then likely you are running a recursive build and the variable value at the start of the build is the one used for that job, you can try to workaround this by following the steps here :
Answers
-
Thanks for tagging me in your question! I've left the nest to clear up a few things and help lead to a quicker solution!
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
One way to do this is by creating a variable ${current-yyyy-mm-dd-var} which you can spend append to the existing ${project_key}/${current-yyyy-mm-dd-var} the path of your S3 managed folder. But it's unclear if this is enough for your use case. If you need to create this dataset once or you need to create these every day at subfolders within your existing folder.
Here is a code sample on how to set the yyyy-mm-dd project variable.
import dataiku from dataiku.scenario import Scenario from datetime import datetime, timedelta, date client = dataiku.api_client() project = client.get_project(dataiku.default_project_key()) scenario = Scenario() variables = project.get_variables() current_date = str(date.today()) variables["standard"]["current-yyyy-mm-dd-var"] = current_date project.set_variables(variables)
-
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
Hi @AlexT
Thanks for sharing , but i am looking for 2nd option i.e. create these every day at subfolders within your existing folder.can you please share code for that.
Warm Regards,
Ankur.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Hi @Ankur30
,Here is an example Python recipe that would do this.
import dataiku import pandas as pd, numpy as np from dataiku import pandasutils as pdu from datetime import date managed_folder_id = "xZUM5Sn9" current_date = str(date.today()) # Read input dataset my_dataset = dataiku.Dataset("my_dataset_name") df = my_dataset.get_dataframe() # Write recipe outputs output_folder = dataiku.Folder(managed_folder_id) #upload_data or upload_stream output_folder.upload_stream("/" + current_date + "/" + "filename.csv", df.to_csv(index=False).encode("utf-8"))
-
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
Hi @AlexT
,Instead of currentdate , i want to use datetime as a path to my S3 bucket so, i changed the below line in above code.
variables["standard"]["current-yyyy-mm-dd-var"] = str(pd.to_datetime('now'))
Although the job is succeeding but while building the output dataset in S3, i am getting error as "Root path doesn't exsist"
Can you help me on this?
Regards,
Ankur.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
The str(pd.to_datetime('now')) has a space in it.
e.g 2021-10-21 11:15:10.842966
That could be your issue when trying to check or retrieve the files, you can try looking directly in S3 to see if the files were created.
Can you share the code snippet and exact error from the job logs?
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
What is your dataset configuration? Blob path? This suggests the datasets can't connect to the specified path configured.
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,212 Dataiker
Perhaps continuing this over a support ticket and sending us the job diagnostics is best course of action here.
Unaware exactly what dataset type is failing to build if this is an S3 dataset or Files from Folder etc.
If it's an S3 Datasets you can find the details :
For Files in Folder Dataset :
-
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
Hi @AlexT
,I cannot build the dataset in dataiku when i pass the current date/time as project variable in path to my S3 bucket. It is working fine rather building the dataset when i pass current_date as project as path to my S3 bucket. That is the issue.
Rest above code you shared is working fine only issue is with date/time variable in my path to S3 bucket.