Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on February 24, 2025 8:11PM
Likes: 0
Replies: 3
Hi everyone,
I'm facing an issue while writing a CSV file to S3 using a Sync Recipe in Dataiku. Even though the dataset looks correct inside Dataiku, when it gets saved to S3, all the data appears in one single column instead of being properly separated into multiple columns.
I checked the dataset schema, and everything seems fine. However, I don’t see an option to specify the delimiter or file format in the dataset settings. The issue occurs when I open the CSV in Excel or a text editor—it looks like the entire row is being stored as a single string.
Is there a way to force Dataiku to correctly format the CSV output when writing to S3 without using a Python recipe? Any help or insights would be greatly appreciated!
Thanks in advance!
Even though Dataiku claims to be writting the dataset output in CSV format it is not. Dataiku datasets use a file format based on Tab Separated Values (TSV) without column headers and it's gzipped which is a Linux zip format. I really wish Dataiku did not claim to be writting datasets as CSV files since it confuses a lot of users.
In any case one option is to use Python to write the CSV using the pandas.DataFrame.to_csv() method but since you don't want to use that you can use the Export to folder recipe, which you will find at the bottom of the right pane under Other recipes. This recipe can create real CSV files in addition to Excel files so if your target is Excel might as well write the file in Excel format to avoid the intermediate CSV format. To use this recipe you will need to create a folder in your S3 connection.
It is being saved as a file not as a csv even after selecting csv.
Can you please share the python way with me.
Not sure what you mean by that. Please post screen shots and clarify what the issue is.