Error when executing SQLExecutor
UserBird
Dataiker, Alpha Tester Posts: 535 Dataiker
I got an error when executing SQLexecutor function in Dataiku IPyhton.
This is the command that I used:
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from dataiku.core.sql import SQLExecutor2
from dataiku.core.sql import SQLExecutor
dtk = dataiku.Dataset('flight_issued_coupon_0428_0603')
SQLexe = SQLExecutor2(dataset=dtk)
dtk = dataiku.Dataset('flight_issued_coupon_0428_0603')
SQLexe = SQLExecutor2(dataset=dtk)
df = SQLexe.query_to_df("""
select 'profile_id', 'user_id',
count(*) as cnt
from dtk group by 'profile_id','user_id'
order by cnt desc
limit 10
""")
and this is the error log that I got:
ERROR: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-127-cb7599f5d1d2> in <module>()
5 order by cnt desc
6 limit 10
----> 7 """)
/home/ubuntu/dataiku-dss-3.0.1/python/dataiku/core/sql.pyc in query_to_df(self, query, pre_queries, post_queries, extra_conf)
223
224 def query_to_df(self, query, pre_queries=None, post_queries=None, extra_conf={}):
--> 225 return _streamed_query_to_df(self._iconn, query, pre_queries, post_queries, self._find_connection_from_dataset, "sql", base.get_shared_secret(), extra_conf)
226
227 def query_to_iter(self, query, pre_queries=None, post_queries=None, extra_conf={}):
/home/ubuntu/dataiku-dss-3.0.1/python/dataiku/core/sql.pyc in _streamed_query_to_df(connection, query, pre_queries, post_queries, find_connection_from_dataset, db_type, secret, extra_conf)
39 logging.info("Got initial SQL query response")
40
---> 41 streamingSession = _handle_intercom_json_resp(resp)
42 queryId = streamingSession['queryId']
43
/home/ubuntu/dataiku-dss-3.0.1/python/dataiku/core/sql.pyc in _handle_intercom_json_resp(resp, err_msg)
15 err_data = resp.text
16 if err_data:
---> 17 raise Exception("%s: %s" % (err_msg, json.loads(err_data).get("message","No details").encode("utf8")))
18 else:
19 raise Exception("%s: %s" % (err_msg, "No details"))
Exception: Call failed: Unrecognized virtual connection type: sql
I also tried with one-liner command like this:
df = SQLexe.query_to_df("select 'profile_id', 'user_id', count(*) as cnt from dtk group by 'profile_id','user_id' order by cnt desc limit 10")
and still got an error but different:
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
<ipython-input-126-93f9770f53a0> in <module>()
----> 1 df = SQLexe.query_to_df("select 'profile_id', 'user_id', count(*) as cnt from dtk group by 'profile_id','user_id' order by cnt desc limit 10")
/home/ubuntu/dataiku-dss-3.0.1/python/dataiku/core/sql.pyc in query_to_df(self, query, pre_queries, post_queries, extra_conf)
223
224 def query_to_df(self, query, pre_queries=None, post_queries=None, extra_conf={}):
--> 225 return _streamed_query_to_df(self._iconn, query, pre_queries, post_queries, self._find_connection_from_dataset, "sql", base.get_shared_secret(), extra_conf)
226
227 def query_to_iter(self, query, pre_queries=None, post_queries=None, extra_conf={}):
/home/ubuntu/dataiku-dss-3.0.1/python/dataiku/core/sql.pyc in _streamed_query_to_df(connection, query, pre_queries, post_queries, find_connection_from_dataset, db_type, secret, extra_conf)
39 logging.info("Got initial SQL query response")
40
---> 41 streamingSession = _handle_intercom_json_resp(resp)
42 queryId = streamingSession['queryId']
43
/home/ubuntu/dataiku-dss-3.0.1/python/dataiku/core/sql.pyc in _handle_intercom_json_resp(resp, err_msg)
15 err_data = resp.text
16 if err_data:
---> 17 raise Exception("%s: %s" % (err_msg, json.loads(err_data).get("message","No details").encode("utf8")))
18 else:
19 raise Exception("%s: %s" % (err_msg, "No details"))
Exception: Call failed: Unrecognized virtual connection type: sql
Answers
-
A little late but here is my thought : the dataset you're trying to reach might be a Hive one. So you should use the HiveExecutor instead of the SQL one.
To check this, just go to your flow and see what recipes DSS allows you to create from your dataset.
Hope this might help someone :-)