-
How to retrieve the test dataset used in the trained model With python?
Hello everyone, I am working on Dataiku, primarily using their API. I have trained my model and would like to retrieve the dataset that was used for testing via the API methods. Despite trying several methods, including get_train_info(), I am unable to obtain the test dataset. I don't want to export it; I just want to…
-
is there a way to check if python code is ran in a recipe vs notebook?
I recently bumped into an issue where my python code was not executed the same way wether it was ran from a notebook or its corresponding recipe*. I eventually used the following function but I was wondering if there were a native function in dataiku to detect the running environment? def in_ipynb(): try: get_ipython()…
-
How to display HTML in managed folder on dashboard?
Hi, I have created an interactive HTML file with data embedded. Opening this in any browser works fine. I uploaded the file and tried to display in a dashboard using both managed folder and a Web Content tiles. The manged folder, just shows the HTML text when the file is selected. The Web Content tile requires a URL, it…
-
propagate schema fails at partitioning sync recipe
I added a column to the start of an SQL pipeline and needed to ensure that the schema change was propagated to subsequent tables. Thankflully, DSS "propage schema" functionality allowed me to automate this task. dataiku's convenient functionality to propagate upstream changes in schema However, at some point I encountered…
-
Comparing two recipes to determine if one is different?
Hi, We create Dataiku project templates which give the user known-good recipes and flow zones to accomplish a certain task. The project has variables we use in the recipes to parameterize the usage such that it can be run using different settings. My question is if there is a way I could loop through the recipes in the…
-
Getting Gini variable importance via API?
Looking at this post for guidance: https://community.dataiku.com/t5/Using-Dataiku/How-to-get-Variable-Importance-from-Model/m-p/3589 led me to this documentation: https://developer.dataiku.com/latest/api-reference/python/ml.html#exploration-of-results where there is documented a function called:…
-
Monthly Partitioning changes partition column value
I am trying to setup monthly partitioning on a date column in my snowflake database. I have the source table and output dataset set as monthly partitioning. In the middle I have a prepare recipe where I use the time range to get a month (screenshot below), the output of the posting_date field changes from an actual date,…
-
Build several partitions in one go
Hi, I want to synchronize an Oracle table of 1 billion rows to another Oracle table. The query is very long and I end up with the following Oracle error: [11:06:27] [INFO] [dku.output.sql] - appended 178620000 rows, errors=0 [11:06:27] [INFO] [dku.utils] - Closing oracle.jdbc.driver.T4CConnection@7fc1cb4f [11:06:27] [INFO]…
-
Create N output datasets dynamically
Hi, I have a dataset which I want to partition into N datasets, where N will change over time. N is > 30 so I don't want to have to manually declare each output dataset in my Python recipe. It is easy enough in Python to create the N dataframes I want to use as the source for each dataset. Can I do this dynamically without…
-
Relationship between Dataiku partitioning vs S3 partitioning
Hi All, Suppose I have a dataset in S3 partitioned and stored as parquet. Suppose I read it into Dataiku as a partitioned dataset (or try to sync it from somewhere else). What is the relationship between the two partitions? If I use the same partition key in Dataiku as S3, will Dataiku recognize that and avoid…