-
Add Granger Causality tests to the stats worksheet
I'd really like to be able to test granger causality between two or more time series. Would it be possible to add it to the stats page, such that I can pick 2 or more input columns, and the GC can be calculated between each pairing and each ordering, over a specified range of lags?
-
"Fold" processors in visual recipe - Implement In-Database engine
Today, fold processors require the DSS engine because they are not supported as in-database processing, which forces dataiku designers to implement SQL recipes to perform fold operations. Most modern databases support "unpivot" syntax, which enable fold processors to be converted to SQL.…
-
API : load a request from postman/bruno collection
Hello all, Configuring by hand a rest api can be painful. On the other hand, the API world use a lot tools such as Postman or Bruno (an open source clone) which allows easy test, debug... I use it everytime I had to work on a rest API and then I try to translate it to the final tool . Both tools offer "collection", a set…
-
ADBC connectivity : faster columnar storage query
Hello all, ADBC is a database connection standard (like ODBC or JDBC) but specifically designed for columnar storage (so database like DuckDB, Clickhouse, MonetDB, Vertica...). This is typically the kind of stuff that can make Dataiku way faster. more info in Here a benchmark made by the guys at DuckDB : 38x improvement…
-
Properly implement support for Building Flow Zones in Scenarios and the Dataiku API
In Dataiku v12.0.0 a new feature was added that allows users to build flow zones from the flow UI: https://knowledge.dataiku.com/latest/data-preparation/pipelines/tutorial-build-modes.html#build-a-flow-zone This works well however this capability was never added properly to Scenarios and to the Dataiku API. In 12.1.0…
-
Smart indexing: recommend index based on downstream recipes
It would be helpful if on the index selection menu for a dataset, some smart values could be displayed based on downstream recipes, and if in the recipe creation views, upstream datasets could be reindexed to optimize them as well. For example, after I've created two join recipes downstream of a dataset, on the index…
-
Project Folder should be capable to manage permissions for underlying projects.
Hi everyone, I’d like to suggest an improvement for Dataiku's folder and project permission management. I find it strange that Dataiku doesn’t inherit folder permissions into project permissions. In case of, project folders are set up for different teams of entities - it shouldn't just be a visual organisation on the…
-
Cartesian product detection in join recipe
What's your use case? Cartesian product is a common issue when joining dataset with a bad key. It's not always easy to detect and users can even forget to check for it because they think they know their data. What's your proposed solution? What I suggest is an option to check if there will be a cartesian product on the…
-
Allow nested flow zones
Hi, I use flow zones a lot and appreciate the value. Why not extend the capability and allow nested flow zones, i.e. a flow zone within a flow zone? thx
-
Unique key detector tool
What's your use case? More than often, you have to deal with a dataset without knowing what's make a row unique. This can lead to misinterpret the data, cartesian product at join and other funny stuff. What's your proposed solution? This is a feature I haven't seen in any data prepation/etl. The core feature is to detect…