-
Functionality questions on DataIKU
* Can Spark be configured for ML algorithms; it looks like current processing is in memory? * Is there Spark processing option available for K means clustering and PCA linear regression? * Is Light GBM available with Spark? * Is automated Hyperparameter tuning available in Dataiku? * Schema visibility: * Current DataIKU…
-
Sync-recipe to Snowflake
I have a flow that gets data from two Snowflake sources, then Python recipe checks the difference of the max(date columns) of both and extracts the rows that are missing from the other dataset. I first tried Syncing that back to snowflake (like updating the other source set with appending the missing rows) but encountered…
-
Regex function to return string between 2 characters
I'm trying to create a regex function that gives me the string between 2 characters I have the string below word1_word2_word3_word4_word5_word6_word7_word8_length_string.txt and I'm trying to return everything after the 7th instance of "_" and before ".txt" Desired output: word8_length_string is there a way to use a regex…
-
Split columns to rows with new line delimiter
I have data structured as such categorycategory details1categoryname categorytype categorylength my category details cell has three values just separated by a new line, is there a way to split that field by a new line delimiter so that I get 2 new rows for those extra fields? me desired output is categorycategory details1…
-
delete copied zone in current project
I am unable to delete copied zone in my current project. How to delete it Operating system used: Windows
-
partitioning
Hi there, i have a problem aggregating big data file base on s3 the data is stored like this way /2022-12-21T00/B00/part-00004-d89b41ad-3d2b-4350-9880-c5f1dfbdbea6.c000.csv.gz the T00 stands for the hour and the B00 is a group that always contains the same subjects. now what i try to achieve what is to aggregate the B00…
-
calculate the difference between 2 datetime values in hh:mm
I have two datetime values date1: "2024-05-60 12:00:00" and date2: "2024-05-07 09:58:00" is it possible to calculate the difference in hh:mm between those two datetimes in dataiku? Operating system used: windows
-
Recipe to create multiple datasets from a single dataset
I have a dataset called "MasterData" in the flow. I want to subset this data based on the "Country" column and save it after including the name of the country (e.g. "MasterDataAustralia") and save it in a zone built exclusively for that Country. I have around 60+ countries in the master data, and new countries may be added…
-
How to create a dynamic contains function
I have 2 files / sheets, one that shows file names and one that shows colors filename sheet colors sheet what I'm trying to do is somehow link these two datasets, if filename contains one of the colors in my colors sheet, then in a new column list that color. desired output is that possible to do in Dataiku? I thought…
-
Github configuration
Hi everyone, I'm trying to connect with my private githun repo via ssh key but I gave this respond: An error happened while adding the following remote: origin (git@github.com:user-repo.git), and then fetching it, caused by: IOException: Process failure, caused by: IOException: Process execution failed (return code 1)…