-
How to choose between updating and appending an output table
I have an connection / output recipe that is outputting to a table in snowflake, I don't see any option in here to choose between outputting and appending data, where can I set that up? does the option pop up after I press run? or is that something I can set up before hand? Operating system used: windows
-
How to create dynamic regex function
I have a column called filename and the field is formatted as name1_name2_name3_name4_name5_YYYYMMDDHHMMSS.txt.pgp name1_name2_name3_name4_YYYYMMDDHHMMSS.txt.pgp name1_name2_name3_name4_name5_name6_YYYYMMDDHHMMSS.txt.pgp up until now, the number of names can differ, but the filename always ends with YYYYMMDDHHMMSS.txt.pgp…
-
How to scrub for keywords in an excel sheet for an email inbox
I currently have a excelsheet (sample data in image attached) showing the emails in my email with their body, subject, date sent etc. How do i make Dataiku scrub through the bodies to retrieve common keywords? eg) in the 9 emails there, dataiku will have 3 of them show up as "marketing enquiries" and so on. I believe text…
-
Prevent Score recipe from resampling in time-series forecasting
Hello! I have built a time-series forecasting model in the lab and am using it to forecast/score sales data. I have a long-format dataset with SKU codes, order dates and sales for each date. For each SKU, I have consecutive dates and no duplicates. Nevertheless, the score recipe keeps on resampling the data, which takes a…
-
Does dataiku support schema Evolution ?
Hello guys, We have many cases that required adding new columns and the dataset. However, The issue of this dataset is shared across many projects and used downstream. Does dataiku support schema Evolution ? is adding new coulmns affecting visual or code recipe in downstream datasets ? Thanks Kind regards
-
Rebalancing training metrics
Hi everybody, i have a doubt about the performances shown by dataiku while estimating a model in dss. In particular, we are applying training dataset rebalancing in order to deal with the unbalanced training dataset in our flow. We apply a 75%(approx ratio) downsampling on the training set with 5 folds stratified cross…
-
How to combine several rows to one rows?
Hello, My data looks like this: Recordsvaluesrecords_0_NameJimmyrecords_0_Number1records_0_StatusStudentrecords_1_NamesMarierecords_1_Number2records_1_StatusWorker And i want it looks like this: NameNumberStatusJimmy1StudentMarie2Worker Any ideas?
-
Reduce Default Sampling Nb. Records from 10000
Hi, I am just wondering on a user-level am I able to reduce the default number of records queried for when sampling a dataset from the 10k default to a custom amount? Often I want to adjust the sampling parameters or filters and waiting for 10k records to load before doing that is time wasted. Thanks!
-
Split dataset by stratified sampling.
When I try to "split" a dataset randomly, I currently get the following options: - Full random - Random subset Neither of those is what I often use to split into training/test data: Stratified sampling, to ensure that classes with very low presence (e.g. only a few dozen of 10000) are present in both sets. Is there…
-
Sampling methods
Hi , I want to see my data based on more than one value belonging to a column. For example the only conditions that we can use filtering in sampling is attached in the document. Can we have something which we use in SQL like "IN". for example the capability now is : COUNTRY="INDIA" OR %INDIA% ( By using contains) how to we…