$ hadoop version Hadoop 3.1.2 Source code repository https://github.com/apache/hadoop.git -r 1019dde65bcf12e05ef48ac71e84550d589e5d9a Compiled by sunilg on 2019-01-29T01:39Z Compiled with protoc 2.5.0…
I am trying to execute a PySpark recipe on a remote AWS EMR Spark cluster and I am getting: Your Spark settings don't define a temporary storage for yarn-cluster modein act.compute_prepdataset1_NP: No…
Greetings ! I'm currently on a platform with Dataiku 11.3.1 and writing datasets on HDFS. IT requires all dataset to be written in Parquet, but the default setting is on CSV (Hive) and it can generate…
Hi Currently when we write into Dataiku file system we only csv and avro format. How can I enable parque format in Dataiku DSS running on linux platform on EC2 instance. I need steps for that. Also we…
I am trying to install the standalone hadoop integration for Dataiku. My Dataiku instance is hosted on a linux server and when I follow the directions for standalone installation here (Setting up Hado…
How can I quickly update the code environment, upload a zipped certificate file to the resources directory, and then make the certificate file accessible during runtime? I upload the file, modify the …
Hello, I am a beginner in Spark and I am trying to setup Spark on our Kubernetes cluster. The cluster is now working and I can run Spark jobs; however, I want to access Spark web UI to inspect how my …
I have setup an HDFS connection to access a Google Cloud Storage bucket on which I have parquet files. After adding GoogleHadoopFileSystem to the hadoop configuration I can access the bucket and files…
Hi All, Thanks for great support. I would like to install DSS in different server where Spark and HDFS are not exist. is it possible to integrate DSS with remote server where spark and hdfs are exist …
In Install guide, it talks about Spark on Hadoop-YARN. Does DataIKU support spark on kubernetes cluster? if yes, then is hadoop install required? . Please link any doc or article in this matter if ava…