-
No connection defined to upload files/jars
I am trying to execute a PySpark recipe on a remote AWS EMR Spark cluster and I am getting: Your Spark settings don't define a temporary storage for yarn-cluster modein act.compute_prepdataset1_NP: No connection defined to upload files/jars I am using this runtime configuration: I also tried adding: spark.yarn.stagingDir…
-
How to setup Athena connection using s3 connection
Hello, i've already setup an s3 connection which is working perfectly. I'd like to use it to setup Athena connection. My s3 connection use sts-assume-role type. Iuse dataikucloudstacj (aws) Can you tell me why thing do not work? do I need specific policies in roles of s3 and intances profile? I've this error in screenshots.
-
EC2 Dataiku connection to Azure Synapse pool - Need help
Hey all, I need some help in figuring out where the source of an issue lies with a particular connection to a Synapse SQL pool. Setup: Dataiku hosted on AWS EC2, built using Ansible + Terraform Synapse SQL Pool located on Azure Everything is (supposed to be) on the same internal network I'm getting the following Error…
-
Questions on Dataiku, EKS, and EMR Serverless for Efficient Data Processing
Hi Dataiku Community, I hope you're all doing well. I wanted to reach out with some questions regarding our current implementation, where we are utilizing EMR Serverless with TBs of data flowing between Snowflake and S3. * Intermediate Datasets Avoidance: We are looking into EKS setup compared to EMR Serverless. How does…
-
org.hibernate.HibernateException: More than one row with the given identifier was found
After trying creating EC2 instances on Fleet Manager, the almost all Fleet Manager menu (Instances-All, Instance templates, virtual networks, License management, etc) shows an error below. I guess the Instance Ids from AWS EC2 servers were duplicated and it causes the error. However, I can't find any database for Fleet…
-
What ports needs to be open for Elastic AI jobs in Kubernetes?
Assuming the DSS base port is 9000 I guess I need to allow incoming connections to ports 9000-9010 from the EKS CIDR, But then when I used the DSS > Administration > Settings > Containerized execution > TEST I see that it also tries to connect to 33249 [2023-08-16 12:54:19,130] [1/MainThread] [INFO] [root] Try to ping…
-
Null fields in JSON are ignored and not ingested
Hi! I am new to using Dataiku and noticed that fields in my JSON files that are null are not being ingested. Is this common/expected behaviour? Is there a setting to force ingesting fields in JSON even though they are null in every JSON I have? I am extracting the JSON files from an S3 bucket Operating system used: MacOS…
-
S3 output file name
Dear all, When I let my recipe export/store the output in S3 it creates a file with this name: out-s0.csv.gz. Is there a way to change the name of the output file? Kind regards TonyR
-
Fleet Manager: Version
What is the simplest way to determine what version of fleet manager we have installed, (other than asking the person who set up the fleet manager in the first place...)?
-
Import project. Connection remapping for dataiku-managed-storage
Hi! I'm trying to export a Dataiku project from the Dataiku online service/version into a local instance. Export goes well but at import issues appear as below: Issues were encountered * ERRORMissing connection Missing connection: Connection missing for dataset baseline_fixed (not remapped): dataiku-managed-storage (EC2) I…