folder in s3 bucket in format /${project_key}/yyyy-mm-dd-filename.

Options
Ankur30
Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner

Hi

I am looking to create a folder in my s3 bucket with current datetime prefix folder name.

can you please help me with the format how to create a datetime folder in s3 using dataiku.

Regards,

Ankur

Best Answer

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Answer ✓
    Options

    Difficult to visualize without seeing your job diagnostics to get a clear picture of what you have done so far. Would definitely encourage you to open a support ticket to continue troubleshooting this.

    My guess is either that the date/time as a project variable was stored at a different time, e.g few ms, second before.

    1) Can you check the exact value of your project variable and see if they're the exact path that exists in S3?

    If it does then likely you are running a recursive build and the variable value at the start of the build is the one used for that job, you can try to workaround this by following the steps here :

    https://community.dataiku.com/t5/General-Discussion/Asynchronous-project-variables-recursive-build/m-p/19811

    Screenshot 2021-10-21 at 15.38.23.png

Answers

  • Dataiku
    Dataiku Administrator, Dataiker, Alpha Tester Posts: 88 Administrator
    Options

    Thanks for tagging me in your question! I've left the nest to clear up a few things and help lead to a quicker solution!

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    One way to do this is by creating a variable ${current-yyyy-mm-dd-var} which you can spend append to the existing ${project_key}/${current-yyyy-mm-dd-var} the path of your S3 managed folder. But it's unclear if this is enough for your use case. If you need to create this dataset once or you need to create these every day at subfolders within your existing folder.

    Here is a code sample on how to set the yyyy-mm-dd project variable.

    import dataiku 
    from dataiku.scenario import Scenario 
    from datetime import datetime, timedelta, date 
    
    client = dataiku.api_client() 
    project = client.get_project(dataiku.default_project_key()) 
    scenario = Scenario()
    
    variables = project.get_variables()
    current_date = str(date.today()) 
    variables["standard"]["current-yyyy-mm-dd-var"] = current_date
    project.set_variables(variables) 
    
    

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
    Options

    Hi @AlexT
    Thanks for sharing , but i am looking for 2nd option i.e. create these every day at subfolders within your existing folder.

    can you please share code for that.

    Warm Regards,

    Ankur.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi @Ankur30
    ,

    Here is an example Python recipe that would do this.

    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu
    from datetime import date
    
    managed_folder_id = "xZUM5Sn9"
    current_date = str(date.today()) 
    
    
    # Read input dataset
    my_dataset = dataiku.Dataset("my_dataset_name")
    df = my_dataset.get_dataframe()
    
    # Write recipe outputs
    output_folder = dataiku.Folder(managed_folder_id)
    #upload_data or upload_stream
    output_folder.upload_stream("/" + current_date + "/" + "filename.csv", df.to_csv(index=False).encode("utf-8"))
  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
    edited July 17
    Options

    Hi @AlexT
    ,

    Instead of currentdate , i want to use datetime as a path to my S3 bucket so, i changed the below line in above code.

    variables["standard"]["current-yyyy-mm-dd-var"] = str(pd.to_datetime('now'))

    Although the job is succeeding but while building the output dataset in S3, i am getting error as "Root path doesn't exsist"

    Can you help me on this?

    Regards,

    Ankur.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    The str(pd.to_datetime('now')) has a space in it.

    e.g 2021-10-21 11:15:10.842966

    That could be your issue when trying to check or retrieve the files, you can try looking directly in S3 to see if the files were created.

    Can you share the code snippet and exact error from the job logs?

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
    Options

    Hi @AlexT
    ,

    I am not getting error in the job logs, it is running fine. I am only getting error in build dataset.

    I am able to see the files in S3 bucket.

    Regards,

    Ankur.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    What is your dataset configuration? Blob path? This suggests the datasets can't connect to the specified path configured.

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
    Options

    Hi @AlexT
    ,

    Can you please help me where I can see all the above details that you are asking for so that I can share the screenshot.

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    Options

    Perhaps continuing this over a support ticket and sending us the job diagnostics is best course of action here.

    Unaware exactly what dataset type is failing to build if this is an S3 dataset or Files from Folder etc.

    If it's an S3 Datasets you can find the details :

    Screenshot 2021-10-21 at 14.36.02.png

    For Files in Folder Dataset :

    Screenshot 2021-10-21 at 14.37.48.png

  • Ankur30
    Ankur30 Partner, Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Dataiku DSS Adv Designer Posts: 40 Partner
    Options

    Hi @AlexT
    ,

    I cannot build the dataset in dataiku when i pass the current date/time as project variable in path to my S3 bucket. It is working fine rather building the dataset when i pass current_date as project as path to my S3 bucket. That is the issue.

    Rest above code you shared is working fine only issue is with date/time variable in my path to S3 bucket.

Setup Info
    Tags
      Help me…