We are experiencing long execution times for a recipe in Dataiku due to handing large datasets, while we have implemented partitioning using a filter on a specific column, it still takes 1.5-2 hours t…
Hi everyone, How can I get the difference in "hours" "minutes" AND "seconds" (example: 14:30:24) from two date? Example: Col 1: 2018-01-01T09:50:15.000Z Col 2: 2018-01-01T10:07:55.000Z Col 3 (differen…
Hello, I’m trying to use the sample.py after unzipping the archive of a model I extracted. The model is a light gbm with a feature selection step. The version of the dss is 12.6.5 However the python s…
Today, fold processors require the DSS engine because they are not supported as in-database processing, which forces dataiku designers to implement SQL recipes to perform fold operations. Most modern …
Hello, I'm trying to find the 1st date between 2 date fields. I was thinking of using a min formula but there may be missing values in these fields and the formula doesn't seem to work in this case.…
I have a data processing task that requires python. Specifically, I'm reading data from a proprietary file format, then writing the extracted data out to the database. I want to split this data into m…
Let's say I have a table containing the following data: IDFruitOther Random Data1Apple, Pear, Cherryaksdhkajshda2NULLkasdhjkasjhkas3Watermelonajshdgjashgdjashg If i run the Split and Fold prepare step…
I'm trying to use a regex in a replace() using the Formula processor in a prepare recipe. According to the documentation/docstring this should work, but I can't get it to recognize the text to replace…
I've got a data pipeline that runs on scenarios. There may or may not be new data every time the scenario runs. I have been doing some inefficient items each time the scenario runs. 1. I have been gua…