Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone,
When I performs SQL query through dataiku Python API
executor = SQLExecutor2(connection="postgres")
executor.query_to_df('SELECT * from "somedb"')
it returns an
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I checked the column names and data, they are pure string or number without any accent. And I cannot find param to set encoding param for this connection.
Would you help me with this issue?
Thanks in advance.
Sorry, I removed it because it was sensitive information.
And I just tested dataikuapi.sql_query, and it works correctly. So maybe there is some error caused by dataiku.SQLExecutor2 ?
Hello, does your source DB use UTF-8 collation or something else, e.g. LATIN1?
Are you using Python 3 as code environment in this project? If so, does switching to Python 2 produce different results?
Collation checked, is en_US.UTF-8
I was using Python3, it gave me the UnicodeDecodeError. If I switch to Python2, then it shows as follow:ā
Hm, that's weird. Could you let me know more details about affected Table please. What are the column types and their content? What is the output in PgAdmin?
Hi, all tables are affected. I even test the hello world query ĀØselect 1 ;ĀØ and it returns the same error. The error only happens in Python environment, either PgAdmin or Dataiku SQL notebook can execute the query perfectly.
Hi, I wasn't able to reproduce your issue so far. I tried changing Python locale as well as Postgres DB locale/collation.
Still, I suspect locale settings might be related. Another possibility is that you have conflicting system locales on DSS server and PostgreSQL nodes (in case those are hosted on different nodes of course) . Hence, could you please
import locale
print (locale.getdefaultlocale(),'\n',locale.getlocale(),'\n',locale.getpreferredencoding())
P.s. For full code export you could download notebook like this
Here you have the notebook with its output.
It seems the forum system doesn't allow me to upload .zip, I created a gist you can see the same notebook in here:
https://gist.github.com/Qiaorui/97b87c059f040699199df3a1d4eafc88
thanks, I still can't reproduce this. What is the output of 'locale' cmd command on your DSS server and Postgres server?
Also, do you have any custom settings set for "postgres" connection? Here is mine
I don't have any custom settings in the connection. And the output of locale command are all
'en_US', 'UTF-8'
thanks, we still think this is environment-related issue; we wonder if there is a proxy involved, hence could you please include next lines before "import" code and provide the output
import os
print(os.environ)
if "http_proxy" in os.environ:
del os.environ["http_proxy"]
if "HTTP_PROXY" in os.environ:
del os.environ["HTTP_PROXY"]
Hi tested, there is no proxy on environment. But I found something interesting.
When I do
dataiku.set_remote_dss in dataiku notebook and execute the query , it will return the error.
However if I run it on dataiku notebook (same notebook as previous one) without dataiku.set_remote_dss, it executes correctly.
My goal is to use dataiku SQL query on dataiku user API, I cannot do this without set_remote_dss š
Hello, I don't see 'set_remote_dss' in https://gist.github.com/Qiaorui/97b87c059f040699199df3a1d4eafc88
Could you please update the file or provide a new link with working vs not working code?
Sorry, I removed it because it was sensitive information.
And I just tested dataikuapi.sql_query, and it works correctly. So maybe there is some error caused by dataiku.SQLExecutor2 ?
For your question, all codes are executed in the same dataiku notebook. The only difference is shown as below table.
lib | set remote dss | result |
dataiku | no | works, but can only applied inside DSS |
dataiku | yes | failed |
dataikuapi | yes | works |
I'm sorry, I don't understand why do you need to use set_remote_dss() inside a Notebook? Are you connecting to remote DSS instance from another DSS instance?
If not, you should be able to simply do client = dataiku.api_client() to get a handle of current DSS instance as per doc
Because I want to execute the SQL_executor2 inside python API, however the API is considered as outside of DSS instance, that's why I'm testing this with notebook
Could you please open a support ticket (https://doc.dataiku.com/dss/latest/troubleshooting/obtaining-support.html#editor-support-for-dataiku...) and attach a diagnostic of the DSS instance (Administration > Maintenance > Diagnostic tool)?
Note that you need to be administrator of the DSS instance - else you'll need to ask your admin
If the resulting file is too large for the suppoprt portal (> 15 MB), you can use https://dl.dataiku.com to send it to us. Please don't forget to send the link that is generated when you upload the file