Using Dataiku

Sort by:

1 - 10 of 34

Spark Cluster mode
Hello, As we are using Spark heavily, we are having the problem of slowness of application launching in yarn cluster mode. The slowness comes from having many DSS related files and also many jars file…
Question
Spark
Started by Bader
Most recent by Alexandru
Feb 18, 2024
0
1
Last answer by Alexandru
Hi,
I believe this has been discussed directly with your Dataiku Account team. There is currently no way to run in cluster mode and running in cluster would not make it faster as the same Jars would still need to be loaded.
Reply to Discussion
Spark Setup
Hi all ,, I need to use the sparkSql and spark for python I installed the spark and it shown in the administration settings . but when I run the sparkSQL it raised this error Cannot run program "spark…
Question
Spark
Started by Alwaleed
Most recent by Catalina
Sep 28, 2023
0
1
Last answer by Catalina
Hi,

This issue usually occurs when spark integration was not re-run after a recent upgrade and it can't find spark-submit.

Please re-run spark integration using the standalone archive downloaded from Dataiku DSS download site for your DSS version:

https://downloads.dataiku.com/public/studio/12.2.0/dataiku-dss-spark-standalone-12.2.0-3.4.1-generic-hadoop3.tar.gz

You can run spark integration with the following command using the version of your DSS version:

/data/bin/dssadmin install-spark-integration -standaloneArchive PATH_TO/dataiku-dss-spark-standalone-12.2.0-3.4.1-generic-hadoop3.tar.gz

This is explained here Setup Spark.
Reply to Discussion
Access the partition value in Pyspark Recipe
I have a table that is partitioned by date, how can I access the partition date in a pyspark recipe I tried the following code but does not recognize actual_date fct_pm_card.select("application_id", "…
Question
Spark
Started by torbiks
Most recent by Alexandru
Sep 26, 2023
0
1
Last answer by
Reply to Discussion
Use Custom UDFs on Visual Recipes
Hello Dataikers! Since all visual recipes are based on SparkSQL, some "advance" aggregations aren't available. In this case, I have 3 values on 3 columns: A, B, C. And I just want to compute Median fr…
Question
code
Spark
Formula
Started by Saul
Most recent by Saul
Sep 7, 2023
0
2
Last answer by
Reply to Discussion
Pyspark Code to Spark Dataframe
I am new to Dataiku. I have been going up and down all over the Dataiku document and trying to make sense of this documentation specially looking into the Dataiku Pyspark code recipe for my project, b…
Question
code
Spark
Started by verleger
Most recent by Alexandru
Aug 16, 2022
0
2
Last answer by
Reply to Discussion
Control write partitioning with Spark
There does not appear to be a way to write spark jobs to disk using a set partition scheme. This is normally done via dataframe.write.parquet(<path>, partitionBy=['year']), if one is to partition the …
Question
Datasets
Python
Spark
Started by jmccartin
Most recent by Cartiernan
Sep 29, 2021
0
3
Last answer by
Reply to Discussion
PyCharm API not supporting PySpark recipes
Hi, the API you recently created and detailed here: https://academy.dataiku.com/latest/tutorial/code/pycharm.html does not appear to support any recipe other than made in Python. If I connect per the …
Question
Spark
IDE support
Started by jmccartin
Most recent by emher
Jan 15, 2021
2
3
Last answer by
Reply to Discussion
Spark pipeline merge rules
What kinds of visual receipts can be merged together during the job executions?
Question
Flow
Spark
Started by baobo
Most recent by baobo
Dec 6, 2019
0
2
Last answer by
Reply to Discussion
Unable to write spark df to csv with column headers and multiple partitions?
Writing spark df into csv along with headers repartitions the df into 1 by default. So it takes a lot of time while writing considering the dataset is large because only 1 partition is active. How do …
Question
code
Troubleshooting
Spark
Started by parul
Oct 2, 2019
0
Reply to Discussion
WebApps with PySpark at backend
From this great tutorial on building webapps http://learn.dataiku.com/howto/code/webapps/use-python-backend.html I can see that I can use Python at the backend for larger volumes of data. Does this ex…
Question
Spark
Webapps
Started by ubethke
Most recent by Brechann_McGoey
May 10, 2019
0
2
Last answer by
Reply to Discussion

1 - 10 of 341

Trending Discussions

Programatically push cde images to a registry
Answered
1
Issue with Dataiku Visual Recipe Failing to Save Data to SQL Database
Answered
1
How to get relationships between flow zones and the datasets
Answered
1

Leaderboard

Member	Points
Turribeach	3689
tgb417	2513
Ignacio_Toledo	1082

Using Dataiku

Top Tags

Trending Discussions

Leaderboard