-
How to stack columns from one dataset
Hi, Here is a simplified schema of a basic dataset structure I need to reshape: firstname name vote col4 col5 col6 col7 col8 col9 etc.. ARTHAUD Nathalie 5 ARMAND Thierry 9 ARNAUD Bernard 6 etc.. ARTHAUD Nathalie 7 ARMAND Thierry 3 ARNAUD Bernard 8 etc.. The number of columns in this is variable but it will always be a…
-
CHAR(1) columns turning into lengths of 2 with spaces in Exports
We have to use data preps from many database tables on different platforms that have columns defined as CHAR(1) to keep them as length's of 1. Otherwise the exports change them to lengths of 2 with spaces added on. So, an indicator column with only "Y" or "N" becomes "Y " (added space) or "N " (added space). Using a data…
-
How to find un-used shared-in datasets of a project with python API ?
Thanks for your time at the beginning. I have a project and I want to know which datasets are shared-in from other projects (black icons) with Python API. However, those shared-in datasets could be seperated into 2 group: ① unused: just showing in in the zone for checking ② used: being used through recipe for analyzing I…
-
Add option to support non-pandas dataframes (e.g. polars) in Python recipes
Hi, There are many pandas alternatives. One that is new and very fast is polars. Polars is built on Rust so it is memory safe and runs in parallel by design. I use polars in one of my recipes but have to convert it to pandas to write the dataset. thx
-
Change Auto-Typing to an off or on option with default “Off”
Would like to have the Auto-Typing setup as an option that can be turned off and on with the default being “Off”. This feature is changing my unit serial numbers (230836735F) to a Float (2.30836735E8) which causes me to lose records when joining on the unit serial numbers field in a following step. This will cause my…
-
Paste list in interim table filter
I would like to be able to copy a list of data from excel and paste it in the interim table filter when using the "Is any of the strings" option instead of having to enter them one at a time. Helps in troubleshooting workflows when you are looking for multiple records.
-
Ctrl + Enter to run a recipe
It would be great to be able to use the shortcut key combination Ctrl + Enter to run a recipe while in the recipe editor screen. This keyboard shortcut would be consistent with what you can do in both Jupyter Notebooks and in SQL Notebooks. I realize that there is a current keyboard shortcut for running a recipe (@ run)…
-
Make Dataiku Managed Datasets Less Opinionated (aka stop dropping my tables)
After 11.4.0 (or earlier as we upgraded from 11.0.3), Dataiku not defaults to dropping and re-creating by default when using Dataset python APIs if for some reason the dataset schema and underlying table do not match. It will do this silently and pass jobs, where later we find out that we've lost our history in the base…
-
Perform quick SQL query on SQL dataset from UI
For my workflow it would be very helpful to have the option to perform a quick SQL query on a (SQL) dataset in the Flow from the UI. For example by right clicking. Things like count distinct values of a specific column, etc. Right now, I go to my separate SQL client to perform these quick checks, but that requires tool…
-
The recipe execution is taking long time due to handling a large volume of data in dataiku
We are experiencing long execution times for a recipe in Dataiku due to handing large datasets, while we have implemented partitioning using a filter on a specific column, it still takes 1.5-2 hours to partitioning 30M records. Is there a more efficient way to handle and process this data quickly and effectively because…