want to load multiple dataframes to seperate csv files in a folder

shreyass
shreyass Dataiku DSS Core Designer, Registered Posts: 4 ✭✭✭
edited July 16 in Using Dataiku

want to extract 5 files (100s more later) to 5 csv files in a folder, error is

<type 'exceptions.AttributeError'>: 'str' object has no attribute 'to_csv'

or is there a better way to do it

test1 = dataiku.Dataset("test1")
df1 = test1.get_dataframe()


test2 = dataiku.Dataset("test2")
df2 = test2.get_dataframe()

test3 = dataiku.Dataset("test3")
df3 = test3.get_dataframe()

test4 = dataiku.Dataset("test4")
df4 = test4.get_dataframe()

test5 = dataiku.Dataset("test5")
df5 = test5.get_dataframe()


files=["df1","df1","df3","df3","df4","df5"]

i = 0
l=0
while i < len(files):
  folder = dataiku.Folder("8eeWrtwq")
  with folder.get_writer(files[l]) as w:
    w.write("files[l]".to_csv(sep="\t",header=False,index=False))
 i = i + 1
 l = l + 1

My PYTHON code is as above -

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    Please use a code block (the </> icon in the toolbar) to post your code otherwise it can not preserve the identation which makes hard to copy/paste and execute.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
    edited July 17

    Your code had three issues. First is that "files[l]" is a string not an object, which is why you get the error. Second is that you need to encode the df.to_csv() output to pass it to a folder writer. The final issue is that you need to name the file properly.

    Here is code that works:

    def write_file_to_folder(df, dataset_name, folder):
        file_name = dataset_name + ".csv"
        with folder.get_writer(file_name) as w:
            w.write(df.to_csv(sep="\t", header=False, index=False).encode('utf-8'))
    
    output_folder = dataiku.Folder("8eeWrtwq")
    
    ds1 = dataiku.Dataset("test1")
    df1 = ds1.get_dataframe()
    write_file_to_folder(df1, ds1.short_name, output_folder)
    
    ds2 = dataiku.Dataset("test2")
    df2 = ds2.get_dataframe()
    write_file_to_folder(df2, ds2.short_name, output_folder)

    Note that this doesn't need produce a proper CSV file since your code removed the header and set the separator to tabs. So these files are really tab separate text files not CSV files (comma separated values).

  • shreyass
    shreyass Dataiku DSS Core Designer, Registered Posts: 4 ✭✭✭

    Thanks, this cleared some air around it but still in case of hundred's of files to write( as mentioned in my initial query), is there any way where i can pass it in a list or a dictionary and the loop picks up dataframes one by one and writes ( all DFs will be added as input to python recipe)

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron

    Can you please explain what exactly are you trying to achieve rather than the steps you think are best to do them? Where are these hundred datasets? Why do you have a hundred? Why do you need to export them to CSV? Thanks

  • LouisDHulst
    LouisDHulst Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Neuron, Registered, Neuron 2023 Posts: 54 Neuron

    If you're going to be dealing with 100s of files, or if this task is going to be repeated a lot in the future, you're probably better off not creating 100s of Dataiku datasets in your flow and connecting them to a Python recipe. That's going to take a lot of time and will be difficult to manage.

    Where are you getting the data from? If these are .csv files that you're uploading to Dataiku, you can put them all into the same folder and use that folder as the input of your recipe. That will allow you to loop through a large number of files and to have a clean Flow.

    If you're using an SQL db connection a different solution will be needed. We need more information to help you.

Setup Info
    Tags
      Help me…