-
Window recipe not producing expected results when using DSS engine
Hi there, The issue I am having is that the DSS engine is producing a completely different result than when I use the SQL engine. Has anyone faced a similar issue? I would appreciate some insight on this. Basically, all I want to do is produce a columns with the MAX() value inferred from another column. No partitions, no…
-
Fuzzy Join: When to use Relative to the Left vs Right Tables.
I'm starting to work with the Fuzzy Joins and having good luck. However, I'm trying to figure out when I might want to use a Relative Threshold related to the Right or Left Table when doing a overall Left Join to find duplicate records. I understand that the proportions of items that need to match will be different based…
-
Empty values vs Null values
Hi Everyone, I want to know a way to distinguish between null values and empty values because I used the isNull() function in the prepare recipe but that function didn't recognize that difference. Regards
-
Import python script in library to code recipe
Hello, I make a script to clean my data. The script uses data from the json files for computation purposes. I store all the script and the json in the library of my project. So here is how the library editor looks. I tried to import some function and class in a code recipe, but get the following error. Is there any way I…
-
How to scrub for keywords in an excel sheet for an email inbox
I currently have a excelsheet (sample data in image attached) showing the emails in my email with their body, subject, date sent etc. How do i make Dataiku scrub through the bodies to retrieve common keywords? eg) in the 9 emails there, dataiku will have 3 of them show up as "marketing enquiries" and so on. I believe text…
-
Is it possible to write multiple datasources at a time using python?
I have a data processing task that requires python. Specifically, I'm reading data from a proprietary file format, then writing the extracted data out to the database. I want to split this data into multiple datasources, one for training data, one for holdouts, and one for bad data (so I can analyze corrupted data from the…
-
TensorFlow slices method using containerized execution
Hi Experts, I am using the tensorflow slices to batch process my images for CNN model. The snapshot of the code is as follows and it runs very well in a Jupyter notebook in dataiku using local execution. When I run the same code in dataiku containerized execution it gives the following errors, I have seen documentation on…
-
Dynamic Rename of Headers
Hello All, First Post here, and very new to Dataiku. I am attempting to consume data from an API call to QuickBase, and I am looking for a solution that accepts a dynamic number of columns, and that replaces any special characters IE ( | , # so that I can output a file to BigQuery without issue. The team that owns the data…
-
Data transformation
Hi All, This is a mock data of my input : Input RuleAccountEntityAuditR1a1,a2e1 R2 e2-c2 This has to be split and transformed to this form : RuleDimensionValueR1Accounta1R1Accounta2R1Entitye1R2Entitye2R2Audit-c2 Then have to search a databse, say "db1" and find the levels :…
-
replace the specific date into the future date
In the date column i have dates in the format '1900-01-01T00:00:00.000Z' i need to replace this date to tomorrow date dynamically. How can i do that using formula ? i.e from 1900-01-01T00:00:00.000Z to '18-08-2023'