Setup & Configuration

Questions on Dataiku, EKS, and EMR Serverless for Efficient Data Processing
Hi Dataiku Community, I hope you're all doing well. I wanted to reach out with some questions regarding our current implementation, where we are utilizing EMR Serverless with TBs of data flowing between Snowflake and S3. * Intermediate Datasets Avoidance: We are looking into EKS setup compared to EMR Serverless. How does…
[BUG] API Service endpoint with enrichment through SQL connection
Hi everyone! So, this is a solved issue that might be helpful for someone else. I'm using DSS 9.0 and this was my experience with endpoint enrichment through SQL connection. When configuring a prediction endpoint to enrich the incoming request with new features, there are two options: 1) Bundle the enrichment dataset…
Accessing Spark web UI
Hello, I am a beginner in Spark and I am trying to setup Spark on our Kubernetes cluster. The cluster is now working and I can run Spark jobs; however, I want to access Spark web UI to inspect how my job is being distributed. We usually port-forward a port(4040), but I am not being able to check which pod is the driver pod…
Spark on Kubernetes - Initial job has not accepted any resources
Hi, We've been having a good experience using Spark and containerized execution on our DSS platform. The next step would be to run Spark on Kubernetes, but we're facing some issues. Things that work: * Building (Spark) base images and code-env specific images * Pushing images to ECR * Starting an EKS cluster (with the same…
How to integrate & connect dataiku to Cloudera Quickstart VM(single-node cluster)
Im attempting to integrate Dataiku Enterprise Edition to Cloudera 5.12.0 running on a VM. Im running Dataiku locally on Mac. Dataiku documentation stipulates that I need to install Hadoop client libraries(java jars) or Hadoop configuration files, The issue is I can't seem to find neither the java files nor configuration…
Spark on local machine (where DSS is intalled) + Spark on another cluster
Is it possible to configure spark on Dss in a way that we can choose "run Spark on the machine of Dss (local macine)"or run the spark-job with a Spark which is installed on another cluster? Additionally: How do we configure that Dss interacts with Spark on another cluster

Leaderboard

Turribeach 3825

tgb417 2533

Ignacio_Toledo 1089

Users like you make us a community!

Create an account to contribute great content, engage with others, and show your appreciation.

Our Top Users

csmith294 19

hadir 18

OlgaO 14

yutaro 13

E 12

Cesar Gustavo 10