Using Dataiku
Sort by:
1 - 10 of
34
- Hello, As we are using Spark heavily, we are having the problem of slowness of application launching in yarn cluster mode. The slowness comes from having many DSS related files and also many jars file…
- Hi all ,, I need to use the sparkSql and spark for python I installed the spark and it shown in the administration settings . but when I run the sparkSQL it raised this error Cannot run program "spark…Last answer by Catalina
Hi,
This issue usually occurs when spark integration was not re-run after a recent upgrade and it can't find spark-submit.
Please re-run spark integration using the standalone archive downloaded from Dataiku DSS download site for your DSS version:
You can run spark integration with the following command using the version of your DSS version:
/data/bin/dssadmin install-spark-integration -standaloneArchive PATH_TO/dataiku-dss-spark-standalone-12.2.0-3.4.1-generic-hadoop3.tar.gz
This is explained here Setup Spark.
- There does not appear to be a way to write spark jobs to disk using a set partition scheme. This is normally done via dataframe.write.parquet(<path>, partitionBy=['year']), if one is to partition the …Last answer by
- Hi, the API you recently created and detailed here: https://academy.dataiku.com/latest/tutorial/code/pycharm.html does not appear to support any recipe other than made in Python. If I connect per the …Last answer by
- Writing spark df into csv along with headers repartitions the df into 1 by default. So it takes a lot of time while writing considering the dataset is large because only 1 partition is active. How do …
- From this great tutorial on building webapps http://learn.dataiku.com/howto/code/webapps/use-python-backend.html I can see that I can use Python at the backend for larger volumes of data. Does this ex…Last answer by
1 - 10 of
341