-
Questions on Dataiku, EKS, and EMR Serverless for Efficient Data Processing
Hi Dataiku Community, I hope you're all doing well. I wanted to reach out with some questions regarding our current implementation, where we are utilizing EMR Serverless with TBs of data flowing between Snowflake and S3. * Intermediate Datasets Avoidance: We are looking into EKS setup compared to EMR Serverless. How does…
-
[BUG] API Service endpoint with enrichment through SQL connection
Hi everyone! So, this is a solved issue that might be helpful for someone else. I'm using DSS 9.0 and this was my experience with endpoint enrichment through SQL connection. When configuring a prediction endpoint to enrich the incoming request with new features, there are two options: 1) Bundle the enrichment dataset…
-
Accessing Spark web UI
Hello, I am a beginner in Spark and I am trying to setup Spark on our Kubernetes cluster. The cluster is now working and I can run Spark jobs; however, I want to access Spark web UI to inspect how my job is being distributed. We usually port-forward a port(4040), but I am not being able to check which pod is the driver pod…
-
Spark on Kubernetes - Initial job has not accepted any resources
Hi, We've been having a good experience using Spark and containerized execution on our DSS platform. The next step would be to run Spark on Kubernetes, but we're facing some issues. Things that work: * Building (Spark) base images and code-env specific images * Pushing images to ECR * Starting an EKS cluster (with the same…
-
How to integrate & connect dataiku to Cloudera Quickstart VM(single-node cluster)
Im attempting to integrate Dataiku Enterprise Edition to Cloudera 5.12.0 running on a VM. Im running Dataiku locally on Mac. Dataiku documentation stipulates that I need to install Hadoop client libraries(java jars) or Hadoop configuration files, The issue is I can't seem to find neither the java files nor configuration…
-
Hive metastore synchronization fails (GSS initiate failed: Server not found in Kerberos database)
DSS 4.0 When trying to synchronize the metastore, I get this error: [18:20:40] [ERROR] [org.apache.thrift.transport.TSaslTransport] running compute_sfpd_incidents_sample_prepared_NP - SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided…
-
Spark on local machine (where DSS is intalled) + Spark on another cluster
Is it possible to configure spark on Dss in a way that we can choose "run Spark on the machine of Dss (local macine)"or run the spark-job with a Spark which is installed on another cluster? Additionally: How do we configure that Dss interacts with Spark on another cluster