Setup & Configuration

PYSPARK_PYTHON environment variable issue in PySpark
Hi I am facing below issue for PySpark recipe. Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set. I have set the environment variables using…
import packages warnings
When I am importing packages, I get the warnings you see in the pictures. What are they and how can I get rid of them?
No connection defined to upload files/jars
I am trying to execute a PySpark recipe on a remote AWS EMR Spark cluster and I am getting: Your Spark settings don't define a temporary storage for yarn-cluster modein act.compute_prepdataset1_NP: No connection defined to upload files/jars I am using this runtime configuration: I also tried adding: spark.yarn.stagingDir…
Best Practices For Updating and Renaming Spark and Container Configurations
Hello Dataiku Community, Hope all is well! Our team is looking to implement new Spark and container configuration settings on our instances. We are curious to understand what the best practices are for updating the existing configurations. For context we have existing Spark configurations already being used by end users,…
General / Rule of Thumb Spark Configuration Settings
We are using managed spark over kubernetes in EKS. We have about 80 active users on our design node, about 1/2 of them use spark regularly. We've tried to make things easy by creating simple spark configurations but are finding that we continuously are changing configurations. With multiple spark applications, has anyone…
How to save Pyspark model from notebook to managed folder
Hi, I'm trying to save pyspark model model.save("/opt/dataiku/design/managed_folders/PROJECT_TEST/9KeBcUKy/ML_SAVED") from notebook to managed folder but I'm getting the following error: Py4JJavaError: An error occurred while calling o2981.save.: org.apache.spark.SparkException: Job aborted. at…
SQL Spark integration
Where is the "SQL Spark integration" > "Enable direct access" settings documented? When setting it to "For reads" I get the following error: Invalid connection configuration The driver for the connection needs to be passed manually to Spark Where do pass the driver for the connection manually? Can I pass the Maven…
Pyspark and python error
I was trying to execute a Pyspark script and encountered a py4j error. Can someone help me with this? I have checked all the version compatibilities as well. I am attaching a screenshot of the error. Operating system used: Ubuntu Operating system used: Ubuntu
PySpark Setup via Dataiku: dkuspark.getdataframe() error
Hi All, I'm just starting out on PySpark (and on Dataiku) and debugging via both Dataiku and PySpark documentation has been quite the challenge. But after a lot of searching, it seems my error may be more isolated to the Dataiku platform. So I want to convert a table from a Redshift/SQL server that I defined in my Dataiku…
Accessing Spark web UI
Hello, I am a beginner in Spark and I am trying to setup Spark on our Kubernetes cluster. The cluster is now working and I can run Spark jobs; however, I want to access Spark web UI to inspect how my job is being distributed. We usually port-forward a port(4040), but I am not being able to check which pod is the driver pod…

Trending Discussions

Advanced LLM Mesh add-on available on Free version?
I couldn't find the answer. I have a community version installed in my server. When I try to use the Agents tools, I get this Usage of Agents requires the Advanced LLM Mesh add-on. Please get in touch with your Dataiku Account Executive or Customer Success Manager to learn more about the Advanced LLM Mesh Is this only…
Incomplete and possibly incorrect documentation for cgroups, can't persist the settings at reboot
Hi, I'm following the Using cgroups for resource control guide, and the Making the settings persistent at reboot is missing instructions for legacy systemv-based init scripts. Our DSS version is now 14.0.2, but we originally installed something pre-v13. Our nodes don't have the /etc/dataiku/ directory, but have…
How to Integrate an Exported Next.js App with Dataiku (v14.0.2)?
Hello everyone, I’m a junior developer and I’m having trouble integrating Next.js with Dataiku. I tried exporting my Next.js application to HTML using the output: export method, but I’m still encountering issues. I’m currently using Dataiku version 14.0.2. Here’s an example of the error I’m seeing: Could anyone advise if…

Leaderboard

Turribeach 3772

tgb417 2525

Ignacio_Toledo 1083