-
refresh partitions in dss via API
Hi, we have added by a python api a new dataset into the project and pointing it to an existing location in HDFS where partition folders are stored. (This location is managed by another DSS instance). This kind of "import" of read only dataset works, but I did not find a way how to "refresh" the list of partitions, i.e.…
-
How to programmatically refresh input dataset partitions with Snowflake?
Hi, I’m working with a Snowflake-partitioned dataset that serves as an input in my project flow. I’d like to automate the refresh of the partition listing, which is normally done manually using the "REFRESH PARTITIONS" button in the Metrics tab. We previously managed to do this with S3 using the…
-
The recipe execution is taking long time due to handling a large volume of data in dataiku
We are experiencing long execution times for a recipe in Dataiku due to handing large datasets, while we have implemented partitioning using a filter on a specific column, it still takes 1.5-2 hours to partitioning 30M records. Is there a more efficient way to handle and process this data quickly and effectively because…
-
How to retrieve the test dataset used in the trained model With python?
Hello everyone, I am working on Dataiku, primarily using their API. I have trained my model and would like to retrieve the dataset that was used for testing via the API methods. Despite trying several methods, including get_train_info(), I am unable to obtain the test dataset. I don't want to export it; I just want to…
-
is there a way to check if python code is ran in a recipe vs notebook?
I recently bumped into an issue where my python code was not executed the same way wether it was ran from a notebook or its corresponding recipe*. I eventually used the following function but I was wondering if there were a native function in dataiku to detect the running environment? def in_ipynb(): try: get_ipython()…
-
How to display HTML in managed folder on dashboard?
Hi, I have created an interactive HTML file with data embedded. Opening this in any browser works fine. I uploaded the file and tried to display in a dashboard using both managed folder and a Web Content tiles. The manged folder, just shows the HTML text when the file is selected. The Web Content tile requires a URL, it…
-
propagate schema fails at partitioning sync recipe
I added a column to the start of an SQL pipeline and needed to ensure that the schema change was propagated to subsequent tables. Thankflully, DSS "propage schema" functionality allowed me to automate this task. dataiku's convenient functionality to propagate upstream changes in schema However, at some point I encountered…
-
Comparing two recipes to determine if one is different?
Hi, We create Dataiku project templates which give the user known-good recipes and flow zones to accomplish a certain task. The project has variables we use in the recipes to parameterize the usage such that it can be run using different settings. My question is if there is a way I could loop through the recipes in the…
-
Getting Gini variable importance via API?
Looking at this post for guidance: https://community.dataiku.com/t5/Using-Dataiku/How-to-get-Variable-Importance-from-Model/m-p/3589 led me to this documentation: https://developer.dataiku.com/latest/api-reference/python/ml.html#exploration-of-results where there is documented a function called:…
-
Monthly Partitioning changes partition column value
I am trying to setup monthly partitioning on a date column in my snowflake database. I have the source table and output dataset set as monthly partitioning. In the middle I have a prepare recipe where I use the time range to get a month (screenshot below), the output of the posting_date field changes from an actual date,…