-
Implement Sampling > Random as Engine:In-Database(SQL) for Snowflake
Currently if I select Sampling method: Random (approx. ratio) or Random (approx. nb. records) the only allowed engine is DSS which will require downloading the input dataset to dss. It's possible to do sampling at the Snowflake side, with https://docs.snowflake.com/en/sql-reference/constructs/sample For Random(approx.…
-
Create Python APIs for Copy Action and Copy Subflow Action
Our objective is create an automated tool to do regression testing on parts of the flow. We don't want to do full flow testing since it will take a significant amount of time to duplicate our target projects and their target dependencies. Also testing whole project will be much more compute work than we want to allocate…
-
Enable round function to round to specific number of decimals
There doesn't seem to be a built-in formula language function that can round a number to specific decimals. The existing round() function only allows to round a number to the nearest integer. So we found two ways of doing this within the formula language (samples given rounding to 1 decimal place): Format to 1 decimal and…
-
Link to the concrete run of the sub-scenario
In the scenario run details, the link in the "run scenario" description forwards me to the most recent sub-scenario run and not the run that was actually running: Given I have a scenario "parent" that runs a sub-scenario "child" with the "Run Scenario" step and I run scenario "parent". In the "Last runs" of scenario, I can…
-
Prevent flow zones from being created with duplicate names
This appears to be by design but it's possible to name zones with the same name. While I see zones have an internal ID when saved there appears to be no checks that there is no name clash. As a result you can have multiple zones with the same name making it confusing for users. Obviously zones can be renamed to have unique…
-
Reorder column list in a Prepare Recipe Move Step
User Story: As a data analyst who likes their columns in a certain order for easy data evaluation I would like to be able to reorder the column names in a move recipe step after initial recipie creation in the same way I can re-order text replacements in a text replacement step. This would save time when I don't get the…
-
Allow project variables to be overridden at the Flow Zone level
Hi, We routinely have to override project variables at the Python recipe level. However, this gets tedious and it would be great if this could be done at the flow zone level. thx
-
Allow export of datasets to parquet file format
Hi, Would be great, from file size and export time perspectives, to allow users to export datasets as parquet. thx
-
Enhance Managed Folders APIs to be able to handle local-vs-non-local folders automatically
Dataiku supports creating Managed Folders over different storage layers including local storage, network storage, cloud storage (ie buckets) or even Sharepoint. However the way you deal with these folders depends on where the API client is located (inside or outside DSS) and where the storage is:…
-
Add filter capability to the run Scenario step / run after Scenario trigger
The run Scenario step / run after Scenario scenario trigger show all available scenarios which in a large Dataiki instance would make it for a very long drop down list (see first image below). This idea is to add a filter box as the one used in the Run As scenario Settings so that you can easily search and find the desired…