Wrong charset detected with UTF-16 CSV FIle
Romain_NIO
Registered Posts: 12 ✭✭✭✭
Hi,
I have to deal with CSV file with UTF-16 encoding due to specific characters (asian chars).
DSS detects a wrong charset instead of the UTF-16 (iso-8859-15). The consequence is that the first name column is invalid, probably because the BOM of the CSV is interpreted as the first column name.
Fortunatelly, I can manually edit the charset and it works. But in several "automatic" cases I will be not there to edit it
Is-there a way to correct that ?
To reproduce the issue, you can download this file : http://www.filedropper.com/romain
NOTE : The file is correctly detected with "file" unix command as "Little-endian UTF-16 Unicode text, with CRLF line terminators"
I have to deal with CSV file with UTF-16 encoding due to specific characters (asian chars).
DSS detects a wrong charset instead of the UTF-16 (iso-8859-15). The consequence is that the first name column is invalid, probably because the BOM of the CSV is interpreted as the first column name.
Fortunatelly, I can manually edit the charset and it works. But in several "automatic" cases I will be not there to edit it
Is-there a way to correct that ?
To reproduce the issue, you can download this file : http://www.filedropper.com/romain
NOTE : The file is correctly detected with "file" unix command as "Little-endian UTF-16 Unicode text, with CRLF line terminators"
Tagged:
Answers
-
Hello Dataiku team,
This issue is not resolved. Any chance to correct this in the next minor release? -
Hi, Would you be able to send us a sample of this file so we can try to reproduce? You can add it to this thread using a like from a file transfer service such as WeTransfer, or send it to me at alexandre dot combessie at dataiku dot com. Cheers, Alex
-
Hi Alex,
Thanks for your help. I sent you an example of file by email. -
Thanks, I have been able to reproduce this issue. I have reported this to our R&D team.
-
This issue will be fixed in the forthcoming 5.1.3 release.