Transition some coding steps to Dataiku Recipe

timchen Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer Posts: 7 ✭✭✭


My team build one machine learning model previously and I am transition the steps from coding to recipe.

I am curious if I can use some recipes to replicate the same data progress, or I could only stick with R.

i. Grouping Stage

Code written in R

jf <- dataS %>%group_by(COLUMN_NAME)%>%summarise(count_jf = n())%>%mutate(Per = prop.table(count_jf))%>%arrange(desc(Per))%>%filter(Per>0.005)dataS$BUCKET_COLUMN_NAME <- ifelse(dataS$COLUMN_NAME %in% jf$COLUMN_NAME,dataS$JCOLUMN_NAME,'OTHER')NEW_BUCKET_COLUMN_NAME <- dataS %>%group_by(BUCKET_COLUMN_NAME) %>%summarise(MED_NEW_BUCKET_COLUMN_NAME = median(COLUMN_NAME2))

Basically this is trying to create some new columns based on grouping, I think I can complete this with the GROUP recipe (with computed columns in it). The only issue for this step is the percentile, is there anything I can get the top/bottom 5% percentile and eliminate it?

ii. Removing outliers

outlier_norm <- function(x){qntile <- quantile(x, probs=c(.25, .75),na.rm = T)caps <- quantile(x, probs=c(.05, .95),na.rm = T)H <- 1.5 * IQR(x, na.rm = T)x[x < (qntile[1] - H)] <- caps[1]x[x > (qntile[2] + H)] <- caps[2]return(x)}

Here is a function to remove the outliers based on the calculations. For this one, I don't know which recipe I can use to perform same calculation. Can anyone tell me if this is possible to replicate by Dataiku recipe?

Thank you very much for reading. Hopefully I can get some answers for these questions.



Setup Info
      Help me…