When I performs SQL query through dataiku Python API
executor = SQLExecutor2(connection="postgres") executor.query_to_df('SELECT * from "somedb"')
it returns an
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I checked the column names and data, they are pure string or number without any accent. And I cannot find param to set encoding param for this connection.
Would you help me with this issue?
Thanks in advance.
Hello, does your source DB use UTF-8 collation or something else, e.g. LATIN1?
Are you using Python 3 as code environment in this project? If so, does switching to Python 2 produce different results?
Hi, all tables are affected. I even test the hello world query ¨select 1 ;¨ and it returns the same error. The error only happens in Python environment, either PgAdmin or Dataiku SQL notebook can execute the query perfectly.
Hi, I wasn't able to reproduce your issue so far. I tried changing Python locale as well as Postgres DB locale/collation.
Still, I suspect locale settings might be related. Another possibility is that you have conflicting system locales on DSS server and PostgreSQL nodes (in case those are hosted on different nodes of course) . Hence, could you please
import locale print (locale.getdefaultlocale(),'\n',locale.getlocale(),'\n',locale.getpreferredencoding())
P.s. For full code export you could download notebook like this
Here you have the notebook with its output.
It seems the forum system doesn't allow me to upload .zip, I created a gist you can see the same notebook in here:
thanks, I still can't reproduce this. What is the output of 'locale' cmd command on your DSS server and Postgres server?
Also, do you have any custom settings set for "postgres" connection? Here is mine
thanks, we still think this is environment-related issue; we wonder if there is a proxy involved, hence could you please include next lines before "import" code and provide the output
import os print(os.environ) if "http_proxy" in os.environ: del os.environ["http_proxy"] if "HTTP_PROXY" in os.environ: del os.environ["HTTP_PROXY"]
Hi tested, there is no proxy on environment. But I found something interesting.
When I do
dataiku.set_remote_dss in dataiku notebook and execute the query , it will return the error.
However if I run it on dataiku notebook (same notebook as previous one) without dataiku.set_remote_dss, it executes correctly.
My goal is to use dataiku SQL query on dataiku user API, I cannot do this without set_remote_dss 😞
Sorry, I removed it because it was sensitive information.
And I just tested dataikuapi.sql_query, and it works correctly. So maybe there is some error caused by dataiku.SQLExecutor2 ?
For your question, all codes are executed in the same dataiku notebook. The only difference is shown as below table.
|lib||set remote dss||result|
|dataiku||no||works, but can only applied inside DSS|
I'm sorry, I don't understand why do you need to use set_remote_dss() inside a Notebook? Are you connecting to remote DSS instance from another DSS instance?
If not, you should be able to simply do client = dataiku.api_client() to get a handle of current DSS instance as per doc
Could you please open a support ticket (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku...) and attach a diagnostic of the DSS instance (Administration > Maintenance > Diagnostic tool)?
Note that you need to be administrator of the DSS instance - else you'll need to ask your admin
If the resulting file is too large for the suppoprt portal (> 15 MB), you can use https://dl.dataiku.com to send it to us. Please don't forget to send the link that is generated when you upload the file