"Connection Refused" when querying value via Pandas on Spark
Rushil09
Partner, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Frontrunner 2022 Participant Posts: 17 Partner
HI,
When I try to find a value via pandas on spark, I get some error saying Connection refused.
Here the size of the data is 24lakh rows. Plus when I run on pandas or core pyspark it works fine.
Can someone help?
Operating system used: Ubuntu
(Topic title edited by moderator to be more descriptive. Original title "Spark")
Tagged:
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi @Rushil09
,Can you please share a snippet of your actual code and what spark version you are using?
Are you trying to Pandas API on Spark?
use https://spark.apache.org/docs/3.2.1/api/python/getting_started/quickstart_ps.html
When interacting with pyspark you should use Dataiku API and Spark APIs:
https://doc.dataiku.com/dss/latest/code_recipes/pyspark.html#anatomy-of-a-basic-pyspark-recipe
Thanks