How do Preserve chienese text format during CSV to dataiku load?

I'm using Dataiku version 13.1. I have a text dataset with around 2400 rows, mostly it's in english but around 100 rows contains chines character. My data is in csv format. I need to perform GenAI task on my dataset & load back to it to CSV
Chinese characters are converting to english characters while loading data from CSV to Dataiku
Can you suggest steps to keep original format of data as per CSV during whole process?
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,248 Neuron
When you upload a CSV file, which is what I am assuming you are doing as you are not saying how you are loading this dataset, you can click on configure format and then on "Show advanced options" and you will be able to specify the charset (aka encoding) of the file. You will need to check with the file producer to find out what character encoding they used to create the file.
Answers
-
Thanks, it works. My CSV is in UTF-8 format. Followed your steps to import the csv & it works. Can you please suggest me how do I export it to CSV to preserve the format?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,248 Neuron
That's a new question so please start a new thread as this one has been marked as answered already.