Upload a file to a partitioned Managed Folder

Thomas_il
Thomas_il Registered Posts: 3 ✭✭✭
edited July 16 in Using Dataiku

Hi!

With a Python recipe, I am trying to upload a CSV to a managed folder, partitioned by year-month-day :

Capture d’écran 2022-05-30 à 18.17.57.png

I have tried everything that I can think of, and each time I get this error :

Invalid partitioning: in running compute_AAAAAAA_2022-05-30: Partitioning scheme is not representable as folders

At the moment, my script looks like this :

# Write recipe outputs
output_folder = dataiku.Folder("AAAAAAAA")
filepath = (datetime.datetime.now() - datetime.timedelta(days = 1)).strftime('%Y-%m-%d') + "/"
filename = 'DAILY_' + (datetime.datetime.now() - datetime.timedelta(days = 1)).strftime('%Y-%m-%d') + '.csv'
df.to_csv(filename, encoding='utf-8')
output_folder.put_file(filepath, filename)

Any help would be greatly appreciated, as I've been banging my head against the wall for quite some time now on his

Thomas


Operating system used: MacOS

Tagged:

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron

    Hello @Thomas_il
    ,

    Some questions before trying to help you:

    • Do you have a source dataset as an input to your python recipe, or is a python recipe with only an output folder?
    • Do you want to have one folder for each daily file you upload, or you want all the files to be in the same folder, but following the filename pattern 'DAILY_%Y-%m-%d.csv'?

    With that clear, I think I can give you some ideas.

    Cheers

  • Thomas_il
    Thomas_il Registered Posts: 3 ✭✭✭

    Hello @Ignacio_Toledo

    It is indeed a python recipe with no input (I am using Paramiko to pull data from an SFTP server).

    Regarding the pattern, both choices are OK for me. The use of the partitioned folder would be to pull the last file from it, using max(folder.list_partitions())

    Thank you for your time and help!

    Thomas

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 412 Neuron
    edited July 17

    Hi @Thomas_il
    ,

    Given your use case, I think you don't need a partitioned folder in principle: if you write all your files with the format 'DAILY_%Y-%m-%d.csv' in the root of the output_folder, you can easily use another python function to get the latest file written:

    output_folder = dataiku.Folder("AAAAAAAA")
    
    import glob
    import os
    
    list_of_files = glob.glob(output_folder.get_path() + '/*.csv')
    latest_file = max(list_of_files, key=os.path.getctime)
    print(latest_file)

    I think this is the simplest option. However, learning to use partitioned folders and datasets is useful, so if you are interested in the alternative solution, let me know and I can share another example

    Cheers

  • Thomas_il
    Thomas_il Registered Posts: 3 ✭✭✭

    Hi @Ignacio_Toledo

    Thanks for this solution, works perfectly.

    Indeed, I would be very happy to learn how to do this with partitioned folders! I will need those in the future, and knowing how they work would be great.

    Again, thanks for your time and help.

    Thomas

Setup Info
    Tags
      Help me…