Ready for Dataiku 10? Try out the Crash Course on new features!GET STARTED

Creating a folder within a folder

kman88
Level 1
Creating a folder within a folder

Hello, let's say I want to create a folder that then contains several other folders. Of course I could just drag all the files into a single folder, however for organizational purposes let's say the way I want to do it makes a lot more sense. Is there a way to do this in the DSS? If there is not what is the workaround to achieve this? I cannot seem to find out how. Any information would be greatly appreciated. Thanks very much!

0 Kudos
3 Replies
Ignacio_Toledo

Hi @kman88,

I guess you are talking about the managed folders within a DSS project, no? Assuming so, there is two options:

  • You can create a folder within the managed folder as seen in the next screen shot:
    new_folder.png
  • But of course, it would be slow to create a complex structure of folders and sub-folders. So the other option could be that you want to copy your files with data from a local folder that already has an organized structure. In that case you can zip the folder, copy the zip file into the managed folder and then use the option to unzip it. Here is a link to a screen cast showing how to do this: https://youtu.be/poQ0ntZwMDs (a bit long because the upload took longer than expected... but you can skip that part 🙂 )

There is another option, which is to use a python script to create something from scratch and then add the files to the right directories, etc. If you are interested in that let me know.

Hope this helps!

DD123
Level 1

@Ignacio_Toledo Hey, thank you for posting this. This is very related to a project I am working on. Could you post a potential solution with Python script? I am working on a step that can auto-create subfolders within a managed folder and save files. Many thanks!

0 Kudos
tgb417
Neuron
Neuron

@DD123 & @Ignacio_Toledo ,

I was working on a very similar problem over the weekend.  I hope that this might be of help.  It should not matter if your Managed folder is local or how mine is connected over SFTP.

Code to Create a Folder in a Programaticly Named Directory.png

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
# # Copy Dataset to Managed SFTP Connected folder

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
# ## Get File Year & Month

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
#These variables were set by another process and are part of the projects local variables
file_year = dataiku.get_custom_variables()["file_year"]
file_month = dataiku.get_custom_variables()["file_month"]

print("Imported File Year =",file_year, "Month =", file_month)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# This creates on Tab Seperated file for each given dataset. It does it in a way that MS Windows SQL server can ingest
def Upload_TSV (input_data, dest_folder):
#print(input_data)
#print(dest_folder)

# get the input data
input_handle = dataiku.Dataset(input_data)
input_df = input_handle.get_dataframe(infer_with_pandas=False)

#get a file name based on the name of the input data
data_source = input_handle.full_name
#create a file name with a prefixed folder name to put in your managed folder
file_name = "Created_Folder-"+file_year+"-"+file_month+"/"+data_source.split(".",1)[1]+'.txt'
#print(file_name)

#setup the destination folder
dest_handle = dataiku.Folder(dest_folder)

#write the data as tab seperated value with not index or headers using MS Windows style line terminator
with dest_handle.get_writer(file_name) as writer:
writer.write(input_df.to_csv(sep="\t", line_terminator = "\r\n"
,index=False, header=False).encode("utf-8"))

print('Wrote file', file_name, 'to', dest_folder)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# The first paramater is the input data table already connected to this python recipie.
# The Second is the name of the managed folder you want connected to this python recipie.
Upload_TSV("ENHANCED_RESULTS", "Data_Managed_Folder")
Upload_TSV("ENHANCED_RESULTS_Corrected", "Data_Managed_Folder")
Upload_TSV("STANDARD_RESULTS", "Data_Managed_Folder")

Hope this is of help to someone along the way.

There is a related post here.

--Tom
A banner prompting to get Dataiku DSS