Creating a folder within a folder

kman88
kman88 Registered Posts: 5 ✭✭✭✭

Hello, let's say I want to create a folder that then contains several other folders. Of course I could just drag all the files into a single folder, however for organizational purposes let's say the way I want to do it makes a lot more sense. Is there a way to do this in the DSS? If there is not what is the workaround to achieve this? I cannot seem to find out how. Any information would be greatly appreciated. Thanks very much!

Answers

  • Ignacio_Toledo
    Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron

    Hi @kman88
    ,

    I guess you are talking about the managed folders within a DSS project, no? Assuming so, there is two options:

    • You can create a folder within the managed folder as seen in the next screen shot:
      new_folder.png
    • But of course, it would be slow to create a complex structure of folders and sub-folders. So the other option could be that you want to copy your files with data from a local folder that already has an organized structure. In that case you can zip the folder, copy the zip file into the managed folder and then use the option to unzip it. Here is a link to a screen cast showing how to do this: https://youtu.be/poQ0ntZwMDs (a bit long because the upload took longer than expected... but you can skip that part )

    There is another option, which is to use a python script to create something from scratch and then add the files to the right directories, etc. If you are interested in that let me know.

    Hope this helps!

  • DD123
    DD123 Registered Posts: 1 ✭✭✭

    @Ignacio_Toledo
    Hey, thank you for posting this. This is very related to a project I am working on. Could you post a potential solution with Python script? I am working on a step that can auto-create subfolders within a managed folder and save files. Many thanks!

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
    edited July 17

    @DD123
    & @Ignacio_Toledo
    ,

    I was working on a very similar problem over the weekend. I hope that this might be of help. It should not matter if your Managed folder is local or how mine is connected over SFTP.

    Code to Create a Folder in a Programaticly Named Directory.png

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
    # # Copy Dataset to Managed SFTP Connected folder

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # -*- coding: utf-8 -*-
    import dataiku
    import pandas as pd, numpy as np
    from dataiku import pandasutils as pdu

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
    # ## Get File Year & Month

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    #These variables were set by another process and are part of the projects local variables
    file_year = dataiku.get_custom_variables()["file_year"]
    file_month = dataiku.get_custom_variables()["file_month"]

    print("Imported File Year =",file_year, "Month =", file_month)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # This creates on Tab Seperated file for each given dataset. It does it in a way that MS Windows SQL server can ingest
    def Upload_TSV (input_data, dest_folder):
    #print(input_data)
    #print(dest_folder)

    # get the input data
    input_handle = dataiku.Dataset(input_data)
    input_df = input_handle.get_dataframe(infer_with_pandas=False)

    #get a file name based on the name of the input data
    data_source = input_handle.full_name
    #create a file name with a prefixed folder name to put in your managed folder
    file_name = "Created_Folder-"+file_year+"-"+file_month+"/"+data_source.split(".",1)[1]+'.txt'
    #print(file_name)

    #setup the destination folder
    dest_handle = dataiku.Folder(dest_folder)

    #write the data as tab seperated value with not index or headers using MS Windows style line terminator
    with dest_handle.get_writer(file_name) as writer:
    writer.write(input_df.to_csv(sep="\t", line_terminator = "\r\n"
    ,index=False, header=False).encode("utf-8"))

    print('Wrote file', file_name, 'to', dest_folder)

    # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
    # The first paramater is the input data table already connected to this python recipie.
    # The Second is the name of the managed folder you want connected to this python recipie.
    Upload_TSV("ENHANCED_RESULTS", "Data_Managed_Folder")
    Upload_TSV("ENHANCED_RESULTS_Corrected", "Data_Managed_Folder")
    Upload_TSV("STANDARD_RESULTS", "Data_Managed_Folder")

    Hope this is of help to someone along the way.

    There is a related post here.

Setup Info
    Tags
      Help me…