-
How to dynamically output recent 3 days' data with partitioned dataset
Hi all, I am new to Dataiku world, I'd like to ask the right way to output data with specific time range with partitioning method. The thing I want to do is: Dynamically build the recent 3-days data from input datasets. (Use Time Range day partition) I've tested but the output seems to grow even more than I expect, and the…
-
raining Failed – Subprocess Did Not Connect in 60000ms (SocketTimeoutException)
Hi everyone, I'm encountering an issue when trying to train a model in Dataiku DSS. The training fails with the following error: Training failed Read the logsSubprocess did not connect in 60000ms, it probably crashed at startup. Check the logs., caused by: SocketTimeoutException: Accept timed out I am on macbook air Mac os…
-
How to run integration tests on flows with Python recipes
I've recently started to use the "Run integration test" scenario step for testing. It's definitely some work to create the test reference datasets but it once set up it's great to be able to run this test after later code changes to confirm the process works as expected. Our flows typically mostly use SQL script recipes.…
-
Examples for custom prediction in API Designer
Are there any actual useful code examples of using custom prediction in python? I have a model that exists in my Flow and I want to use that model to make a prediction just like the Prediction model api endpoint would do to start and then add more custom code on top of that. The boiler plate code imports dataiku and…
-
Dataiku
While I am trying to add some column values in the resultant column I am getting the value as NaN. Operating system used: Windows
-
Unusual Error with Group By Recipe in Dataiku
Hello, I’ve encountered an unusual error while using the Group By recipe in Dataiku. Here’s a summary of the issue: Context: I created a Group By recipe on three columns, applying three custom aggregations using SQL. Input Data: The recipe takes as input a PostgreSQL (PGSQL) table, which is the output of a JOIN operation…
-
Mismatch in random ports for notebook kernels
When I try to use the notebook kernels in containerized environments I get different ports stored in DSS and in the kubernetes pod is that normal? The file jupyter-run/jupyter/runtime/kernel-c234fbb1-84dc-44ad-a18e-7ad0aff81702.json shows different ports: { "shell_port": 51875, "iopub_port": 38011, "stdin_port": 35689,…
-
Storing and Retrieving Embeddings in Knowledge Bank via Python
Hello Team, I hope you are doing well. I am currently working on a project in Dataiku 13.1.2, where I am generating embeddings using LLM Mesh in Python code. At present, I am storing these embeddings in a PostgreSQL dataset. However, I would like to store them directly into a Knowledge Bank using Python code. Key…
-
Using a variable or not depending on the scenario
Hi guys, I have a flow with 2 different scenarios. I have one variable v_idproduct used in a post filter join recipe in a sql code (id_product IN v_id_product). In each scenario I have a different list of id products. I want to modify one of the scenarios so that this filter is no longer applied, allowing all product IDs…
-
How to get relationships between flow zones and the datasets
What functionality exists to show the relationship between flowzones. Say we have 42 flow zones with an average of 50 datasets each. Is there a way to summarize the relationships? I am interested in seeing stuff like: Which flow zones have the same color. Which flow zones have datasets that feed into other ones? Which…