want to load multiple dataframes to seperate csv files in a folder
want to extract 5 files (100s more later) to 5 csv files in a folder, error is
<type 'exceptions.AttributeError'>: 'str' object has no attribute 'to_csv'
or is there a better way to do it
test1 = dataiku.Dataset("test1") df1 = test1.get_dataframe() test2 = dataiku.Dataset("test2") df2 = test2.get_dataframe() test3 = dataiku.Dataset("test3") df3 = test3.get_dataframe() test4 = dataiku.Dataset("test4") df4 = test4.get_dataframe() test5 = dataiku.Dataset("test5") df5 = test5.get_dataframe() files=["df1","df1","df3","df3","df4","df5"] i = 0 l=0 while i < len(files): folder = dataiku.Folder("8eeWrtwq") with folder.get_writer(files[l]) as w: w.write("files[l]".to_csv(sep="\t",header=False,index=False)) i = i + 1 l = l + 1
My PYTHON code is as above -
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Please use a code block (the </> icon in the toolbar) to post your code otherwise it can not preserve the identation which makes hard to copy/paste and execute.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Your code had three issues. First is that "files[l]" is a string not an object, which is why you get the error. Second is that you need to encode the df.to_csv() output to pass it to a folder writer. The final issue is that you need to name the file properly.
Here is code that works:
def write_file_to_folder(df, dataset_name, folder): file_name = dataset_name + ".csv" with folder.get_writer(file_name) as w: w.write(df.to_csv(sep="\t", header=False, index=False).encode('utf-8')) output_folder = dataiku.Folder("8eeWrtwq") ds1 = dataiku.Dataset("test1") df1 = ds1.get_dataframe() write_file_to_folder(df1, ds1.short_name, output_folder) ds2 = dataiku.Dataset("test2") df2 = ds2.get_dataframe() write_file_to_folder(df2, ds2.short_name, output_folder)
Note that this doesn't need produce a proper CSV file since your code removed the header and set the separator to tabs. So these files are really tab separate text files not CSV files (comma separated values).
-
Thanks, this cleared some air around it but still in case of hundred's of files to write( as mentioned in my initial query), is there any way where i can pass it in a list or a dictionary and the loop picks up dataframes one by one and writes ( all DFs will be added as input to python recipe)
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Can you please explain what exactly are you trying to achieve rather than the steps you think are best to do them? Where are these hundred datasets? Why do you have a hundred? Why do you need to export them to CSV? Thanks
-
LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron
If you're going to be dealing with 100s of files, or if this task is going to be repeated a lot in the future, you're probably better off not creating 100s of Dataiku datasets in your flow and connecting them to a Python recipe. That's going to take a lot of time and will be difficult to manage.
Where are you getting the data from? If these are .csv files that you're uploading to Dataiku, you can put them all into the same folder and use that folder as the input of your recipe. That will allow you to loop through a large number of files and to have a clean Flow.
If you're using an SQL db connection a different solution will be needed. We need more information to help you.