Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
My team build one machine learning model previously and I am transition the steps from coding to recipe.
I am curious if I can use some recipes to replicate the same data progress, or I could only stick with R.
i. Grouping Stage
Code written in R
jf <- dataS %>%
group_by(COLUMN_NAME)%>%
summarise(count_jf = n())%>%
mutate(Per = prop.table(count_jf))%>%
arrange(desc(Per))%>%
filter(Per>0.005)
dataS$BUCKET_COLUMN_NAME <- ifelse(dataS$COLUMN_NAME %in% jf$COLUMN_NAME,
dataS$JCOLUMN_NAME,'OTHER')
NEW_BUCKET_COLUMN_NAME <- dataS %>%
group_by(BUCKET_COLUMN_NAME) %>%
summarise(MED_NEW_BUCKET_COLUMN_NAME = median(COLUMN_NAME2))
Basically this is trying to create some new columns based on grouping, I think I can complete this with the GROUP recipe (with computed columns in it). The only issue for this step is the percentile, is there anything I can get the top/bottom 5% percentile and eliminate it?
ii. Removing outliers
outlier_norm <- function(x){
qntile <- quantile(x, probs=c(.25, .75),na.rm = T)
caps <- quantile(x, probs=c(.05, .95),na.rm = T)
H <- 1.5 * IQR(x, na.rm = T)
x[x < (qntile[1] - H)] <- caps[1]
x[x > (qntile[2] + H)] <- caps[2]
return(x)
}
Here is a function to remove the outliers based on the calculations. For this one, I don't know which recipe I can use to perform same calculation. Can anyone tell me if this is possible to replicate by Dataiku recipe?
Thank you very much for reading. Hopefully I can get some answers for these questions.
Best,
Tim