UnicodeDecodeError when performs sql query with Python API
Hi everyone,
When I performs SQL query through dataiku Python API
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
it returns an
executor = SQLExecutor2(connection="postgres") executor.query_to_df('SELECT * from "somedb"')
I checked the column names and data, they are pure string or number without any accent. And I cannot find param to set encoding param for this connection.
Would you help me with this issue?
Thanks in advance.
Best Answer
-
Sorry, I removed it because it was sensitive information.
And I just tested dataikuapi.sql_query, and it works correctly. So maybe there is some error caused by dataiku.SQLExecutor2 ?
Answers
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
Hello, does your source DB use UTF-8 collation or something else, e.g. LATIN1?
Are you using Python 3 as code environment in this project? If so, does switching to Python 2 produce different results?
-
Collation checked, is en_US.UTF-8
I was using Python3, it gave me the UnicodeDecodeError. If I switch to Python2, then it shows as follow:
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
Hm, that's weird. Could you let me know more details about affected Table please. What are the column types and their content? What is the output in PgAdmin?
-
Hi, all tables are affected. I even test the hello world query ¨select 1 ;¨ and it returns the same error. The error only happens in Python environment, either PgAdmin or Dataiku SQL notebook can execute the query perfectly.
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
Hi, I wasn't able to reproduce your issue so far. I tried changing Python locale as well as Postgres DB locale/collation.
Still, I suspect locale settings might be related. Another possibility is that you have conflicting system locales on DSS server and PostgreSQL nodes (in case those are hosted on different nodes of course) . Hence, could you please
- provide output of "locale" command line output from your DSS and PostgreSQL node(s).
- your full code (if possible so I can compare it with mine) and Python 'locale' output
import locale print (locale.getdefaultlocale(),'\n',locale.getlocale(),'\n',locale.getpreferredencoding())
P.s. For full code export you could download notebook like this
-
Here you have the notebook with its output.
It seems the forum system doesn't allow me to upload .zip, I created a gist you can see the same notebook in here:
https://gist.github.com/Qiaorui/97b87c059f040699199df3a1d4eafc88
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
thanks, I still can't reproduce this. What is the output of 'locale' cmd command on your DSS server and Postgres server?
Also, do you have any custom settings set for "postgres" connection? Here is mine
-
I don't have any custom settings in the connection. And the output of locale command are all
'en_US', 'UTF-8'
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
thanks, we still think this is environment-related issue; we wonder if there is a proxy involved, hence could you please include next lines before "import" code and provide the output
import os print(os.environ) if "http_proxy" in os.environ: del os.environ["http_proxy"] if "HTTP_PROXY" in os.environ: del os.environ["HTTP_PROXY"]
-
Hi tested, there is no proxy on environment. But I found something interesting.
When I do
dataiku.set_remote_dss in dataiku notebook and execute the query , it will return the error.
However if I run it on dataiku notebook (same notebook as previous one) without dataiku.set_remote_dss, it executes correctly.
My goal is to use dataiku SQL query on dataiku user API, I cannot do this without set_remote_dss
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
Hello, I don't see 'set_remote_dss' in https://gist.github.com/Qiaorui/97b87c059f040699199df3a1d4eafc88
Could you please update the file or provide a new link with working vs not working code?
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 DataikerSo, do I read this correctly that your code works fine in Notebook but fails when being executed outside DSS? If so, you should indeed be using sql_query() for code outside DSS as stated in the doc: "This function was initially designed for usage outside of DSS and only supports returning results as an iterator. It does not support pandas dataframe"By the way, did you compare “os.environ” output for both notebook (datiaku package) and dataikuapi package code executed outside DSS?
-
For your question, all codes are executed in the same dataiku notebook. The only difference is shown as below table.
lib set remote dss result dataiku no works, but can only applied inside DSS dataiku yes failed dataikuapi yes works -
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
I'm sorry, I don't understand why do you need to use set_remote_dss() inside a Notebook? Are you connecting to remote DSS instance from another DSS instance?
If not, you should be able to simply do client = dataiku.api_client() to get a handle of current DSS instance as per doc
-
Because I want to execute the SQL_executor2 inside python API, however the API is considered as outside of DSS instance, that's why I'm testing this with notebook
-
dima_naboka Dataiker, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts Posts: 28 Dataiker
Could you please open a support ticket (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku-customers) and attach a diagnostic of the DSS instance (Administration > Maintenance > Diagnostic tool)?
Note that you need to be administrator of the DSS instance - else you'll need to ask your admin
If the resulting file is too large for the suppoprt portal (> 15 MB), you can use https://dl.dataiku.com to send it to us. Please don't forget to send the link that is generated when you upload the file