-
The recipe execution is taking long time due to handling a large volume of data in dataiku
We are experiencing long execution times for a recipe in Dataiku due to handing large datasets, while we have implemented partitioning using a filter on a specific column, it still takes 1.5-2 hours to partitioning 30M records. Is there a more efficient way to handle and process this data quickly and effectively because…
-
How to run integration tests on flows with Python recipes
I've recently started to use the "Run integration test" scenario step for testing. It's definitely some work to create the test reference datasets but it once set up it's great to be able to run this test after later code changes to confirm the process works as expected. Our flows typically mostly use SQL script recipes.…
-
Looking to replicate a SUM(COUNTIF) formula in Dataiku
I am working on a scorecard in Dataiku and I would like to calculate the percentage of completion in a set number of columns. Basically, I would like to replicate this formula in excel: =SUM(COUNTIF(ColumnX:ColumnXX,"*")/Total Number of Columns) and am having issues. The columns are a mix of strings, integers, and text,…
-
Recipe failed but notebook runs (same code environment)
My recipe failed to resolve a pip package I installed on code env. The same code runs well on Notebook area but failed on recipe using the same code env. I suspect the recipe uses old version of the code env but I have no way of forcing it to use the latest code env build. How? Operating system used: Windows Operating…
-
Copy code recipe from flow
Hi, I have many similar but different code recipes. Would be nice to have a flow right-click->Copy option that creates a new identical recipe with the output not yet named but the inputs configured and the code inside copied. I currently just create a new blank recipe and configure it with the same inputs, change the…