Hi Dataiku Community, I hope you're all doing well. I wanted to reach out with some questions regarding our current implementation, where we are utilizing EMR Serverless with TBs of data flowing between Snowflake and S3. * Intermediate Datasets Avoidance: We are looking into EKS setup compared to EMR Serverless. How does…
Hi everyone! So, this is a solved issue that might be helpful for someone else. I'm using DSS 9.0 and this was my experience with endpoint enrichment through SQL connection. When configuring a prediction endpoint to enrich the incoming request with new features, there are two options: 1) Bundle the enrichment dataset…
Hello, I am a beginner in Spark and I am trying to setup Spark on our Kubernetes cluster. The cluster is now working and I can run Spark jobs; however, I want to access Spark web UI to inspect how my job is being distributed. We usually port-forward a port(4040), but I am not being able to check which pod is the driver pod…
Hi, We've been having a good experience using Spark and containerized execution on our DSS platform. The next step would be to run Spark on Kubernetes, but we're facing some issues. Things that work: * Building (Spark) base images and code-env specific images * Pushing images to ECR * Starting an EKS cluster (with the same…
Im attempting to integrate Dataiku Enterprise Edition to Cloudera 5.12.0 running on a VM. Im running Dataiku locally on Mac. Dataiku documentation stipulates that I need to install Hadoop client libraries(java jars) or Hadoop configuration files, The issue is I can't seem to find neither the java files nor configuration…
Is it possible to configure spark on Dss in a way that we can choose "run Spark on the machine of Dss (local macine)"or run the spark-job with a Spark which is installed on another cluster? Additionally: How do we configure that Dss interacts with Spark on another cluster
Create an account to contribute great content, engage with others, and show your appreciation.