-
Add Venn diagram and UpSet plot to Charts
I'm encountering some use cases where I want to easily visualize the number of records belonging to one or several groups and their overlap where group membership is spread over multiple 1/0 columns. Would be super handy to have Venn diagrams in the Charts or, sometimes even better, UpSet plots.
-
Select Columns Outside of Join Recipe
I would like to be able to select the columns of data outside of a join recipe. A couple of examples: 1 - Usage of "unmatched rows". The column selection occurs after the join does not apply to data that isn't joined. In this case I am using both sets of data so need the option to select columns from both sets. 2 - Removal…
-
Have a dataiku templating engine based on Python mako or jinja
Hi, Python based templating engines like jinja and mako allow users to 'print' text in various formats, using conditional logic statements like if-else and for loops. I think dataiku should offer an off the shelf Python based templating engine that would allow users to upload their template(s) and pass a `context dict` to…
-
Edit default metrics and checks as a project-wide setting
When creating a new dataset, I practically always edit the default metrics and checks to run row counts after build. Ideally, I could define this from the project settings so that every new dataset created automatically has my desired metrics and checks configured. Of course, this doesn't apply to column-specific values,…
-
Option to rearrange output columns in join recipe
I would like to have the option to rearrange output columns in the join recipe. Perhaps by making the 'hamburger' icons on the Output panel draggable.
-
Properly implement support for Building Flow Zones in Scenarios and the Dataiku API
In Dataiku v12.0.0 a new feature was added that allows users to build flow zones from the flow UI: https://knowledge.dataiku.com/latest/data-preparation/pipelines/tutorial-build-modes.html#build-a-flow-zone This works well however this capability was never added properly to Scenarios and to the Dataiku API. In 12.1.0…
-
Smart indexing: recommend index based on downstream recipes
It would be helpful if on the index selection menu for a dataset, some smart values could be displayed based on downstream recipes, and if in the recipe creation views, upstream datasets could be reindexed to optimize them as well. For example, after I've created two join recipes downstream of a dataset, on the index…
-
Looking to replicate a SUM(COUNTIF) formula in Dataiku
I am working on a scorecard in Dataiku and I would like to calculate the percentage of completion in a set number of columns. Basically, I would like to replicate this formula in excel: =SUM(COUNTIF(ColumnX:ColumnXX,"*")/Total Number of Columns) and am having issues. The columns are a mix of strings, integers, and text,…
-
Unique key detector tool
What's your use case? More than often, you have to deal with a dataset without knowing what's make a row unique. This can lead to misinterpret the data, cartesian product at join and other funny stuff. What's your proposed solution? This is a feature I haven't seen in any data prepation/etl. The core feature is to detect…
-
Data Upsert
Currently, Dataiku offers the choice to either overwrite or append data during dataset updates, yet it lacks the capability for a user to perform an upsert on their data. An upsert operation, which merges the functions of updating and inserting, enables users to harmonize their existing dataset with new or modified data.…