Transition some coding steps to Dataiku Recipe
Hello,
My team build one machine learning model previously and I am transition the steps from coding to recipe.
I am curious if I can use some recipes to replicate the same data progress, or I could only stick with R.
i. Grouping Stage
Code written in R
jf <- dataS %>% group_by(COLUMN_NAME)%>% summarise(count_jf = n())%>% mutate(Per = prop.table(count_jf))%>% arrange(desc(Per))%>% filter(Per>0.005) dataS$BUCKET_COLUMN_NAME <- ifelse(dataS$COLUMN_NAME %in% jf$COLUMN_NAME, dataS$JCOLUMN_NAME,'OTHER') NEW_BUCKET_COLUMN_NAME <- dataS %>% group_by(BUCKET_COLUMN_NAME) %>% summarise(MED_NEW_BUCKET_COLUMN_NAME = median(COLUMN_NAME2))
Basically this is trying to create some new columns based on grouping, I think I can complete this with the GROUP recipe (with computed columns in it). The only issue for this step is the percentile, is there anything I can get the top/bottom 5% percentile and eliminate it?
ii. Removing outliers
outlier_norm <- function(x){ qntile <- quantile(x, probs=c(.25, .75),na.rm = T) caps <- quantile(x, probs=c(.05, .95),na.rm = T) H <- 1.5 * IQR(x, na.rm = T) x[x < (qntile[1] - H)] <- caps[1] x[x > (qntile[2] + H)] <- caps[2] return(x) }
Here is a function to remove the outliers based on the calculations. For this one, I don't know which recipe I can use to perform same calculation. Can anyone tell me if this is possible to replicate by Dataiku recipe?
Thank you very much for reading. Hopefully I can get some answers for these questions.
Best,
Tim