Dataiku cloudでSparkを使う方法について
GCP環境で立ち上げたDataiku cloudでsparkを使おうとするとエラーが出てしまいます。 ドキュメントにはDataiku cloudではデフォルトでSparkが使えると書いてあるのですがPysparkレシピを実行すると画像のようなエラーが出ます。何か設定をする必要があるのでしょうか。 Operating system used: Windows Operating system used: Windows
How to use "Execute Python unit test" scenario step
A new scenario step was added in a recent DSS version which is to execute a Python unit test. I'd like to start using this. However, the documentation is pretty brief: "this step executes one or more Python pytest tests from a project’s Libraries folder using a Pytest selector". Anyone have more details on or an example of…
Custom trigger does not executes python code
Hi, I am quite new in Dataiku and I am interested why the following code does not work as expected. Namely, I am trying to define custom trigger that will check if the folder is empty. Both from dataiku.scenario import Trigger t=Trigger() folder = dataiku.Folder("folder_id") files = folder.list_paths_in_partition() if…
Add Venn diagram and UpSet plot to Charts
I'm encountering some use cases where I want to easily visualize the number of records belonging to one or several groups and their overlap where group membership is spread over multiple 1/0 columns. Would be super handy to have Venn diagrams in the Charts or, sometimes even better, UpSet plots.
Group by with empty value and with Null value
Hello everyone ! I have a dataset with empty values in one of the columns (col1) and I use a group by recipe on an other column (col2) without empty values with col1_distinct as aggregation. I get a volume of 21, 199 and 1608 for the 3 col2 fields. But I wanted to add a condition on col1 with a prepare recipe with a…
I want to adapt OpenVPN's functionality to the API
Do you have a request to extend OpenVPN functionality to APIs, rather than just DBs and storage with connectors? We would like to use OpenVPN to connect via API from an application operating in a closed NW environment, but the current functionality does not allow us to connect. If you have the same request, we would be…
schema propagation problem
In the dataset explore, I can define 'description' using 'edit column schema'. And I can propagate the schema to the following 'flow' using 'schema propagation'. However, sometimes it is not inherited (if there is a 'prepare recipe' in the middle) I want to know how to inherit it normally.
DataScienceStudio.app not updating to DSS Release 13.4.0
Hi, usually the DataScienceStudio.app detects new DSS releases and asks the user to update. But this time, I am getting any update notifications even a newer version of Dataiku is available. My current version: 13.3.0 Latest version: 13.4.0 Operating system used: MacOs 15
How to use Notebook
Hi Team, I have signed in with my gmail account and not able to use environment. Can someone help me how to start using this enviornment or notebooks to work a sample hacthaton
vector database
How to use vector database in dataiku for premises LLMS Operating system used: Operating System: Red Hat Enterprise Linux 8.5 (Ootpa) CPE OS Name: cpe:/o:redhat:enterprise linux:8::baseos Kernel: Linux 4.18.0-348.20.1.el8 5.x86 64 Architecture: x86-64
Google Workspace as SAML SSO provider for DSS?
Is there anyone out there using a Google Workspace Domain to set up a single sign-on environment for Dataiku DSS? Operating system used: Linux
How does the evaluation store threshold actually work?
In the documentation for the evaluation store, when doing a two-class (binary) classification, there is a slider for the threshold used. The documentation for this threshold reads in part: When doing binary classification, most models don’t output a single binary answer, but instead a continuous “score of being positive”.…
Dash Long Callbacks Not Working
Hi all, I'm struggling to get long callbacks to work in Dataiku. When I initalize the app = dash.Dash() instance, the application does not run at all. When I remove it, the application runs but the callback does not work at all. Currently, it only works with a regular callback but I need it to work with a long callback.…
Scenario steps documented in Project Documentation
I see that the Scenarios are not documented in the auto created project documentation. This feature will greatly help to document how our automations are orchestrated
Dashboard Improvements on Reference Lines
Looking for 2 Dashboard enhancements on the Reference Lines tuning Reference Line Value Currently, if "Constant" is chosen as the source, a manual value must be entered. It would be beneficial to allow the use of a global variable in the value field. Ability add an aggregation of a different dataset column Have a dynamic…
how can I make a Django app in Dataiku ?
I'm looking for information regarding coding a Django application in Dataiku. Any information on how to achieve this ? thanks
can not register to DSS , after first time of installation
I am facing this error: Network Error: An attempt to communicate with DSS failed. Please check your network connectivity. Operating system used: RHEL-8.9 Operating system used: RHEL-8.9
How to set up a random forest regression?
Select Columns Outside of Join Recipe
I would like to be able to select the columns of data outside of a join recipe. A couple of examples: 1 - Usage of "unmatched rows". The column selection occurs after the join does not apply to data that isn't joined. In this case I am using both sets of data so need the option to select columns from both sets. 2 - Removal…
Option to display short descriptions on flow
Hi All, Forgive me if this has been discussed before, or if it is a polarizing topic as far as visual design goals. In evaluating Dataiku against other products, and ultimately deciding on Dataiku due to its many strengths, one thing my team lamented was that it was not possible to display descriptions of flow elements on…
Dataiku users from Romania
Are there any other community members from Romania? PS: > also started this thread as a log for tracing personal progress. >Day one-13.01.2025 - Installed application x Data Preparation Quick Start 8 of 8 lessons completed (100%)
Ommiting quotes around scenario string variables in a Freemarker email template
Hi, I have a scheduled project scenario that sends an email on some condition. The scenario contains a step that sets scenario variables based on values in a dataset. Here's that step: import dataiku import dataiku.scenario # Read the dataset df = dataiku.Dataset("node-disk-usage").get_dataframe() use_percentage_threshold…
About deployer infrastructure setting
I would like to know how to set K8s-related settings in the infrastructure of dataiku deployer and make the ingress controller (NGINX) option visible in the service exposition. Of course, I know that the ingress controller has been set up in K8s and needs to be done. However, I would like to know how to make the Ingress…
Is the person I'm talking to on what's app really from dataiku
Im not convinced
SQL Step to copy table
Hi, was trying to run insert statement as step in scenario , it's loading only 10 records can you please guide me what could be the issue?
How to check for consecutive monthly buys
i have a dataset that has purchasing history for many items for the past 10 years, i want to pull out only the items that have been purchased every month for the last 10 years. how do i go about this? Operating system used: macos
How do I use training and prediction dataset together in Dataiku
Hi I'm using Dataiku version 13.1. I need to do text prediction using BERT for that I have training dataset. After training I need to score it using prediction dataset. I'm doing BERT using Python code recipe. Can you suggest me the steps to score the prediction dataset?
Allow nested flow zones
Hi, I use flow zones a lot and appreciate the value. Why not extend the capability and allow nested flow zones, i.e. a flow zone within a flow zone? thx
Cannot Create Published API Service - Already have API node
Hi, I'm trying to publish my model in API node, i'm currently running in Development side, my dev side already consist of Designer node, Gov Node, and API node, i kept getting error "not authorized - Cannot Create Published API Service" How to address this issue? or anything to do with my User License? Thanks Operating…
ProcessDiedException - can't access the Flow
Hi, I was migrating a project from a localhost Dataiku to a shared local network instance and after importing the project, I can't access the flow screen. All I get is "HTTP code: 500, type: com.dataiku.dip.exceptions.ProcessDiedException" error message. Is there a way to debug this and find the underlaying issue ? I found…
Error While Accessing visual flow
"detailedMessage": "Cannot run program \"/data/dataiku/bin/jek\": error\u003d0, Failed to exec spawn helper: pid: 2472572, exit value: 1, caused by: IOException: error\u003d0, Failed to exec spawn helper: pid: 2472572, exit value: 1" I receive this error massage while tried to delete rows in visual recipe and affecting all…
Why does Dataiku allow two web-apps with the same name?
I was surprised to find Dataiku allows two web-apps with the same name to exist. Why? The expected behavior would be to ask the user if they want to overwrite a published web-app when name collision occurs. thx Operating system used: WIndows 10
Integrating Dataiku with Denodo
Has anyone created an integration with Denodo? If so, did you use a JDBC connection? How did you manage user permissions? Thanks! Operating system used: Red Hat
disabling Update output schemas in Python Scenario
Hi how do I disable the option that updates output schema in a Python scenario? Can't find the option in the API reference. Would be equivalent of circled below. Thanks Operating system used: Windows Operating system used: Windows
API Service: Python Prediction Endpoint vs Python Function
Hi, I have a general question regarding the difference between the python prediction and python function endpoints in the API Service in regards to serving a custom python model. From my understanding, the only advantage that the python prediction endpoint has over the python function endpoint is the ability to…
Merge/Group rows based on metrics
I am trying to merge/Group rows based on metric range. Present format Desired format
How do I set the logging level for Python recipes?
import logging logging.basicConfig(level=logging.ERROR) I have already tried the code above, but my recipe Python output still shows logs DEBUG and INFO. It means it is very hard for me to find the output from my tqdm progress bar. I am using DSS 13.3.3 Operating system used: Linux Operating system used: Linux
Ability to package environment/local variables with an API service
It would be very helpful if Dataiku allowed for packaging variables (either environment or local variables) with the capability to remap local variables as part of the deployment. Ideally there would also be an option to encrypt a variable. We have several API services that connect to other systems and require environment…
snowflake connection is working fine from Dataiku , however fetching the table preview throws error
Failed to read data from table Failed to read data from table, caused by: SnowflakeSQLException: JDBC driver internal error: exception creating result java.lang.NoClassDefFoundError: Could not initialize c