-
List managed folders from project
Currently, the only way to view which managed folders are associated with a project is to check the flow. However, on large projects, the flow is too large to load. (On my project of just 7,000 datasets, the flow crashes the browser tab). Datasets and recipes can be listed in the datasets and recipes pages, but managed…
-
Comments in Formula
User Story: As a creator of formulas in Dataiku, I would like to be able to add comments in formulas, this would allow me to leave information in formulas about why formulas are configured the way that they are, increasing trust and communications, and it would allow the ability to "comment out" chunks of code while…
-
Allow datasets to automatically reload schema when jobs run.
Currently, if columns in a dataset source are added or removed, jobs and scenarios that read from that dataset will fail until you reload the the schema from table. Even if everything downstream does not have dependencies on the column changes. We would like to see a setting to allow datasets to always reload schema when…
-
Ability to choose input data set for copied and pasted subflows
I often have to copy a portion of a flow to use in a different section. Having the ability to define my input data would make things more efficient and eliminate some human error. In the use case I have, I want to copy the portion circled in red and paste it to where the green circle is, but I don't want it to branch off…
-
Managed-datasets Metadata Synchronization Across Multiple DSS Instances
Use Case As an organization, we utilize three distinct DSS instances to manage our data analytic and ML workflows: * Self-Service and Data Products Consumption Instance: For end-users to consume data products, and work independently by having access to curated data. * Design and Development Instance: For designing and…
-
Ability to zip files from one folder to another
A business user in my team is trying to upload daily pulls to an SFTP. These files are created by separate Snowflake queries, then merged using a Merge Folder recipe. The business user would like to be able to zip these files into a single folder before uploading to SFTP (3rd party requirement). Currently they are using a…
-
Multiple Tabs Within a Project
Hi, My name is Yusuf Afolabi, I work for Caterpillar as a data scientist. I use Flow Zone a lot and it has been very helpful. Recently, I have been seen situations where navigating to a specific Flow Zone becomes problematic. Think of having like 10 different Flow Zone(s) in a project: you would have to scroll back and…
-
Feature Upgrade Request for Dataiku - related Vector DB, PII detection
Currently, we are proposing Dataiku as a Generative AI platform to one of our key clients. During the solution evaluation process, the client identified two key functionalities that are not yet supported by Dataiku. If these features are added, I am confident that they will significantly contribute to securing a new logo.…
-
Remove the 30 char table/column name limitation for Oracle datasets
Oracle 19c has extended the table/column name length limit from 30 char to about 1000, but DSS (ver 9) still honors this old 30 char length limit for Oracle datasets. Hope this limit can be removed in future versions since everybody is on 19c or higher now.
-
Cartesian product detection in join recipe
What's your use case? Cartesian product is a common issue when joining dataset with a bad key. It's not always easy to detect and users can even forget to check for it because they think they know their data. What's your proposed solution? What I suggest is an option to check if there will be a cartesian product on the…