Hi I am facing below issue for PySpark recipe. Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment …
I am trying to execute a PySpark recipe on a remote AWS EMR Spark cluster and I am getting: Your Spark settings don't define a temporary storage for yarn-cluster modein act.compute_prepdataset1_NP: No…
Hello, As we are using Spark heavily, we are having the problem of slowness of application launching in yarn cluster mode. The slowness comes from having many DSS related files and also many jars file…
Hi all ,, I need to use the sparkSql and spark for python I installed the spark and it shown in the administration settings . but when I run the sparkSQL it raised this error Cannot run program "spark…
I have a table that is partitioned by date, how can I access the partition date in a pyspark recipe I tried the following code but does not recognize actual_date fct_pm_card.select("application_id", "…
Hello Dataikers! Since all visual recipes are based on SparkSQL, some "advance" aggregations aren't available. In this case, I have 3 values on 3 columns: A, B, C. And I just want to compute Median fr…
Hello Dataiku Community, Hope all is well! Our team is looking to implement new Spark and container configuration settings on our instances. We are curious to understand what the best practices are fo…
We are using managed spark over kubernetes in EKS. We have about 80 active users on our design node, about 1/2 of them use spark regularly. We've tried to make things easy by creating simple spark con…
Hi, I'm trying to save pyspark model model.save("/opt/dataiku/design/managed_folders/PROJECT_TEST/9KeBcUKy/ML_SAVED") from notebook to managed folder but I'm getting the following error: Py4JJavaError…