Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello, let's say I want to create a folder that then contains several other folders. Of course I could just drag all the files into a single folder, however for organizational purposes let's say the way I want to do it makes a lot more sense. Is there a way to do this in the DSS? If there is not what is the workaround to achieve this? I cannot seem to find out how. Any information would be greatly appreciated. Thanks very much!
Hi @kman88,
I guess you are talking about the managed folders within a DSS project, no? Assuming so, there is two options:
There is another option, which is to use a python script to create something from scratch and then add the files to the right directories, etc. If you are interested in that let me know.
Hope this helps!
@Ignacio_Toledo Hey, thank you for posting this. This is very related to a project I am working on. Could you post a potential solution with Python script? I am working on a step that can auto-create subfolders within a managed folder and save files. Many thanks!
I was working on a very similar problem over the weekend. I hope that this might be of help. It should not matter if your Managed folder is local or how mine is connected over SFTP.
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
# # Copy Dataset to Managed SFTP Connected folder
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: MARKDOWN
# ## Get File Year & Month
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
#These variables were set by another process and are part of the projects local variables
file_year = dataiku.get_custom_variables()["file_year"]
file_month = dataiku.get_custom_variables()["file_month"]
print("Imported File Year =",file_year, "Month =", file_month)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# This creates on Tab Seperated file for each given dataset. It does it in a way that MS Windows SQL server can ingest
def Upload_TSV (input_data, dest_folder):
#print(input_data)
#print(dest_folder)
# get the input data
input_handle = dataiku.Dataset(input_data)
input_df = input_handle.get_dataframe(infer_with_pandas=False)
#get a file name based on the name of the input data
data_source = input_handle.full_name
#create a file name with a prefixed folder name to put in your managed folder
file_name = "Created_Folder-"+file_year+"-"+file_month+"/"+data_source.split(".",1)[1]+'.txt'
#print(file_name)
#setup the destination folder
dest_handle = dataiku.Folder(dest_folder)
#write the data as tab seperated value with not index or headers using MS Windows style line terminator
with dest_handle.get_writer(file_name) as writer:
writer.write(input_df.to_csv(sep="\t", line_terminator = "\r\n"
,index=False, header=False).encode("utf-8"))
print('Wrote file', file_name, 'to', dest_folder)
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# The first paramater is the input data table already connected to this python recipie.
# The Second is the name of the managed folder you want connected to this python recipie.
Upload_TSV("ENHANCED_RESULTS", "Data_Managed_Folder")
Upload_TSV("ENHANCED_RESULTS_Corrected", "Data_Managed_Folder")
Upload_TSV("STANDARD_RESULTS", "Data_Managed_Folder")
Hope this is of help to someone along the way.
There is a related post here.