Dataiku DSS 7 AMA is live! Learn more

Wrong charset detected with UTF-16 CSV FIle

Level 2
Wrong charset detected with UTF-16 CSV FIle
Hi,

I have to deal with CSV file with UTF-16 encoding due to specific characters (asian chars).

DSS detects a wrong charset instead of the UTF-16 (iso-8859-15). The consequence is that the first name column is invalid, probably because the BOM of the CSV is interpreted as the first column name.

Fortunatelly, I can manually edit the charset and it works. But in several "automatic" cases I will be not there to edit it 🙂

Is-there a way to correct that ?

To reproduce the issue, you can download this file : http://www.filedropper.com/romain

NOTE : The file is correctly detected with "file" unix command as "Little-endian UTF-16 Unicode text, with CRLF line terminators"
5 Replies
Level 2
Hello Dataiku team,

This issue is not resolved. Any chance to correct this in the next minor release?
0 Kudos
Dataiker
Dataiker
This issue will be fixed in the forthcoming 5.1.3 release.
0 Kudos
Dataiker
Dataiker
Thanks, I have been able to reproduce this issue. I have reported this to our R&D team.
0 Kudos
Level 2
Hi Alex,

Thanks for your help. I sent you an example of file by email.
0 Kudos
Dataiker
Dataiker
Hi, Would you be able to send us a sample of this file so we can try to reproduce? You can add it to this thread using a like from a file transfer service such as WeTransfer, or send it to me at alexandre dot combessie at dataiku dot com. Cheers, Alex
0 Kudos
Labels (1)