-
Option to rearrange output columns in join recipe
I would like to have the option to rearrange output columns in the join recipe. Perhaps by making the 'hamburger' icons on the Output panel draggable.
-
Properly implement support for Building Flow Zones in Scenarios and the Dataiku API
In Dataiku v12.0.0 a new feature was added that allows users to build flow zones from the flow UI: https://knowledge.dataiku.com/latest/data-preparation/pipelines/tutorial-build-modes.html#build-a-flow-zone This works well however this capability was never added properly to Scenarios and to the Dataiku API. In 12.1.0…
-
Smart indexing: recommend index based on downstream recipes
It would be helpful if on the index selection menu for a dataset, some smart values could be displayed based on downstream recipes, and if in the recipe creation views, upstream datasets could be reindexed to optimize them as well. For example, after I've created two join recipes downstream of a dataset, on the index…
-
Looking to replicate a SUM(COUNTIF) formula in Dataiku
I am working on a scorecard in Dataiku and I would like to calculate the percentage of completion in a set number of columns. Basically, I would like to replicate this formula in excel: =SUM(COUNTIF(ColumnX:ColumnXX,"*")/Total Number of Columns) and am having issues. The columns are a mix of strings, integers, and text,…
-
Unique key detector tool
What's your use case? More than often, you have to deal with a dataset without knowing what's make a row unique. This can lead to misinterpret the data, cartesian product at join and other funny stuff. What's your proposed solution? This is a feature I haven't seen in any data prepation/etl. The core feature is to detect…
-
Data Upsert
Currently, Dataiku offers the choice to either overwrite or append data during dataset updates, yet it lacks the capability for a user to perform an upsert on their data. An upsert operation, which merges the functions of updating and inserting, enables users to harmonize their existing dataset with new or modified data.…
-
Maintain case of SQL table name when creating SQL datasets
Currently, when a SQL dataset is created, the name of the associated SQL table is set to PROJECTKEY_tablename regardless of the case of the SQL dataset name. It would be great if either the case of the dataset name was maintained in the SQL table name (so dataset ABC would result in a SQL table name of PROJECTKEY_ABC…
-
Idea: Include Associated Objects When Duplicating Dataset / Flows
Hello When duplicating parts of a flow in Dataiku, the associated datasets are duplicated, but the developed charts linked to those datasets are not included. This means that users have to manually recreate or copy these charts, which can be time-consuming and prone to errors. Benefits Including the duplicate feature for…
-
Add support for Pandas 2.0
Pandas 2.0 can bring great performance improvements when using the pyarrow backend: https://towardsdatascience.com/pandas-2-0-a-game-changer-for-data-scientists-3cd281fcc4b4
-
Add option to support non-pandas dataframes (e.g. polars) in Python recipes
Hi, There are many pandas alternatives. One that is new and very fast is polars. Polars is built on Rust so it is memory safe and runs in parallel by design. I use polars in one of my recipes but have to convert it to pandas to write the dataset. thx