-
Dss running Out of memory too fast
Hello ! I'm currently working on a DSS instance, having fun building data piplines, but it's running out of memory after 2hrs. DSS is installed manually on a docker image. We tryed several setup with 'xmx' configuration but we ends-up with many idle processes (~20 jek) that filled our allocated memory. The number of jek is…
-
Read excel file using Python Pandas
Hi, I went through the community and I couldn't find a solution for my issue when I try to import excel files from SFTP using python recipe: I'm using the following code: import dataikuimport pandas as pdimport numpy as npfrom dataiku import pandasutils as pduFOLDER_NAME = 'folder_1'FILE_NAME = 'file_1.xlsx'DATASET_NAME =…
-
Use prometheus instead of graphite for monitoring
The https://doc.dataiku.com/dss/latest/operations/monitoring.html#concepts documentation describe how to push DSS metrics to a graphite server. However, is it possible to push metrics to an existing prometheus environment (already in use as a central monitoring solution) instead of having to deploy a specific graphite…
-
Queued Activities and Job Prioritization
Is there a way to * query the number of waiting activities due to insufficient slots i. at any given time ii. Historically? * differentiate an activity in waiting status because it’s awaiting an upstream dependent activity vs a job slot vs global slot. * prioritize jobs over others? E.g. a priority param per job? * If not,…
-
Scaling Horizontally & Resource management with Kubernetes
Resource management & Scaling horizontally with Kubernetes Given that activities within the same job run as threads within the same JEK process. How do you size the Kubernetes pods accordingly? A job may contain sections where 20x activities can run concurrently or may be completely sequential only using 1 core. a) Given…
-
Notifications do not obey user settings
Under My Settings is an option to allow or disable various notifications (user log in/out, edit of watched objects, etc). However, the appearance of these notifications in the bottom right corner of the screen does not follow the user setting specified. For example, my settings have user log in/out disabled, but I am…
-
Retrieval of the node type and the DSS version by means of the API
Hello, While accessing a DSS instance through the API, I am able to retrieve the node name ('nodeName' in the data returned by the "get_general_settings" method of DSSClient) but I can't find neither the node type neither the DSS version of the instance. Is there a way to get these information by means of the API from…
-
output of a random forest classification
Hi i hope you doing well , i have a binary classification of two classe 0 and 2 and when i test my model on another dataset i get 3 columns in the output : the probability of being 0 (proba_0) , the probability of being 2 (proba_2) and the class (0 or 2) the logic is that if the proba_0 > 0,5 the algorithm must predict 0…
-
Ways to limit job log size
Hi, is there a way to limit the log size when running a job? Sometimes we run a job and that fails. When this happens all temp data and logging is stored which leads to disk space issues. In no time logging of over 100GB is created.
-
Memory usage per scenario
As a follow up to https://community.dataiku.com/t5/Setup-Configuration/Feature-request-Monitor-memory-usage-per-scenario/m-p/1823#M322, it sure seems like this has been implemented via https://doc.dataiku.com/dss/latest/operations/monitoring.html#configure-dss-to-push-metrics, which I've done. I'm now seeing this in job…