-
About changing the execution disk
I am using the free version of Windows-Dataiku. I am still a beginner so I am running the academy tutorial. The web address is http://localhost:11200/projects/ It is. I would like to create various samples, but I don't have enough space on my HDD. Is it possible to change the drive location of the execution? If possible, a…
-
Queued Activities and Job Prioritization
Is there a way to * query the number of waiting activities due to insufficient slots i. at any given time ii. Historically? * differentiate an activity in waiting status because it’s awaiting an upstream dependent activity vs a job slot vs global slot. * prioritize jobs over others? E.g. a priority param per job? * If not,…
-
Scaling Horizontally & Resource management with Kubernetes
Resource management & Scaling horizontally with Kubernetes Given that activities within the same job run as threads within the same JEK process. How do you size the Kubernetes pods accordingly? A job may contain sections where 20x activities can run concurrently or may be completely sequential only using 1 core. a) Given…
-
Ways to limit job log size
Hi, is there a way to limit the log size when running a job? Sometimes we run a job and that fails. When this happens all temp data and logging is stored which leads to disk space issues. In no time logging of over 100GB is created.
-
Accessing Spark web UI
Hello, I am a beginner in Spark and I am trying to setup Spark on our Kubernetes cluster. The cluster is now working and I can run Spark jobs; however, I want to access Spark web UI to inspect how my job is being distributed. We usually port-forward a port(4040), but I am not being able to check which pod is the driver pod…
-
Partition Project Permission between Multiple Administrators
We have multiple administrators in our Dataiku instance. Let us say they are administrators A and B. Is it possible to give a specific project permission only to administrator A and not to B? Can the project/environment permissions between administrators in the same instance be partitioned?
-
Moving the dss_home directory on MacOSX
Hi, Currently I'm running the latest version of DSS on MacOSX environment. I just came to notice, after months of using them, I discovered that it consumed large portion of my hard drive. You see, I'm running an older MacBook Air model which came with 250GB of storage. When I had it setup for the first time, I did the…
-
DSS Memory Optimization tips: Backend, Python/R, Spark jobs
Sometimes, users running DSS jobs and processes can get the following error: “OutOfMemoryError: Java Heap Space” or “GC overhead limit exceeded” This happens when a Java process (like the DSS backend, or a Spark job) exceeds its maximum memory allocation, called the "Xmx". In this article, we try to present a few…
-
Define global job limit on recipe engine
The DSS allows to define a recipe type or tag based limits globally https://doc.dataiku.com/dss/latest/flow/limits.html. However I dont know how to set it per engine, so for example I would like to set a limit to run at most 10 impala queries and 10 spark jobs at the time, in total at most 20 jobs. But if the scenario is…
-
recommendation for the sizing of the DSS installation server for 25 users
Hello, For about 25 users, what would be your recommendation for the sizing of the DSS installation server in terms of CPU, RAM, storage, ...? Thank you :)