Output tab file to managed folder in DSS

ele_f
ele_f Registered Posts: 17 ✭✭✭✭

Hi,

I have a dataframe in my DSS workflow which I want to change and store in a non-csv file within a folder.

Assume my dataframe is called df and for the example you can recreate is as follows

df = pd.DataFrame({"a": [1,2,3,4,5], "b": [6,7,8,9,10], "c": [11,12,13,14,15]})

I now want to add a few lines of comment above the dataframe and then save the file automatically in a folder.

Firstly, I have taken my dataset and load it into a folder ("my_input_folder") with the DSS recipe "Export to folder" calling the file df.csv. Then I have added a python script which reads the file, adds the comments and output it in another folder ("my_output_folder"). The code is below but it didn't get what I wanted. Could you please help?

# -*- coding: utf-8 -*-

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu

import os.path

# Recipe inputs

folder_path = dataiku.Folder("my_input_folder").get_path()

path_of_csv = os.path.join(folder_path, "df.csv")



# Recipe outputs

output2 = dataiku.Folder("my_output_folder")

output2_path = output2.get_path()



completeName = os.path.join(folder_path, "df.csv")

file1 = open(completeName, "w")

toFile = raw_input("# This is my first comment\n This is my other comment \n") # I need to write two comments on two different rows

file1.write(toFile)

file1.close()

dirPath2 = os.path.join(output2_path,file1)

Thank you!

Tagged:

Answers

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Hello,

    What is the expected format and content of your output? Could you give us an example in text format? We would like to better understand the goal of adding comments to a csv file.

    From our understanding, a possibility for you could be to:

    - use a python recipe reading the original csv as a pandas dataframe,

    - add your comments inside the dataframe using pandas merge or append method

    - write your dataframe to file using one of pandas methods: https://pandas.pydata.org/pandas-docs/stable/api.html#id12

    Cheers,

    Alex
  • ele_f
    ele_f Registered Posts: 17 ✭✭✭✭
    Hi Alex,
    thanks for your reply.
    I think what you advised is similar to what I did ( the python code above was used to read the file and add the comment).
    However once I add the comment to my df, the format is not compatible anymore to a DSS dataframe, hence why I was trying to use the managed folders.

    Essentially having my df :

    a b c
    0 1 6 11
    1 2 7 12
    2 3 8 13
    3 4 9 14
    4 5 10 15

    I want to write some comments (this is necessary to match some file format requirement I am given). So the output of python would be like below:

    # comment
    # comment2

    a b c
    0 1 6 11
    1 2 7 12
    2 3 8 13
    3 4 9 14
    4 5 10 15

    Now, the above can not be stored in DSS because it does not respect the row-column DF format, so I want to store this file in a managed folder, with extension .tab.

    Let me know if it is not clear.
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Hi,
    Thanks for the clarification. This is more a Python-related question than to DSS. You can have a look at solutions like https://stackoverflow.com/questions/5914627/prepend-line-to-beginning-of-a-file. This way you can:
    - write csv to file using pandas
    - prepend your comments to the csv text file
  • ele_f
    ele_f Registered Posts: 17 ✭✭✭✭
    Thanks. Can you please just let me know if DSS folders allow any type of file format? I.e. can a DSS folder allow for writing .tab files in it or does it allow only to contain csv files?
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Yes, a "DSS" folder is just a regular filesystem folder where you can store anything you want.
Setup Info
    Tags
      Help me…