Convert a csv file to a tab file

Solved!
raia
Level 1
Convert a csv file to a tab file

I have a csv file in my flow which I would like to export as a tab file. Is it possible using a python recipe? If so, what would that recipe be?

 

Thanks!


Operating system used: Windows

0 Kudos
1 Solution
dgraham
Dataiker

Hi @raia ,

You could export the dataset in your flow, and download it as a TSV (Tab Separated Values) formatted file, by clicking on the "Export" button, then specifying "\t" as the separator.

Additionally, from a Python code recipe, you could output the CSV dataset to a managed folder as a TSV file using the Pandas Python library, as shown in the below example code:

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
comma_separated_file = dataiku.Dataset("comma_separated_file")
df = comma_separated_file.get_dataframe()

# the name of the tab file
file_name = 'tab_separated_file.tsv'

# Write recipe outputs
output_folder = dataiku.Folder("output_folder")
with output_folder.get_writer(f"/{file_name}") as writer:
    writer.write(df.to_csv(sep='\t', encoding='utf-8', index=False).encode("utf-8"))

 

 

View solution in original post

1 Reply
dgraham
Dataiker

Hi @raia ,

You could export the dataset in your flow, and download it as a TSV (Tab Separated Values) formatted file, by clicking on the "Export" button, then specifying "\t" as the separator.

Additionally, from a Python code recipe, you could output the CSV dataset to a managed folder as a TSV file using the Pandas Python library, as shown in the below example code:

 

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Read recipe inputs
comma_separated_file = dataiku.Dataset("comma_separated_file")
df = comma_separated_file.get_dataframe()

# the name of the tab file
file_name = 'tab_separated_file.tsv'

# Write recipe outputs
output_folder = dataiku.Folder("output_folder")
with output_folder.get_writer(f"/{file_name}") as writer:
    writer.write(df.to_csv(sep='\t', encoding='utf-8', index=False).encode("utf-8"))