Transition some coding steps to Dataiku Recipe

timchen · August 2021

Hello,

My team build one machine learning model previously and I am transition the steps from coding to recipe.

I am curious if I can use some recipes to replicate the same data progress, or I could only stick with R.

i. Grouping Stage

Code written in R

jf <- dataS %>%
group_by(COLUMN_NAME)%>%
summarise(count_jf = n())%>%
mutate(Per = prop.table(count_jf))%>%
arrange(desc(Per))%>%
filter(Per>0.005)

dataS$BUCKET_COLUMN_NAME <- ifelse(dataS$COLUMN_NAME %in% jf$COLUMN_NAME,
dataS$JCOLUMN_NAME,'OTHER')

NEW_BUCKET_COLUMN_NAME <- dataS %>%
group_by(BUCKET_COLUMN_NAME) %>%
summarise(MED_NEW_BUCKET_COLUMN_NAME = median(COLUMN_NAME2))

Basically this is trying to create some new columns based on grouping, I think I can complete this with the GROUP recipe (with computed columns in it). The only issue for this step is the percentile, is there anything I can get the top/bottom 5% percentile and eliminate it?

ii. Removing outliers

outlier_norm <- function(x){
  qntile <- quantile(x, probs=c(.25, .75),na.rm = T)
  caps <- quantile(x, probs=c(.05, .95),na.rm = T)
  H <- 1.5 * IQR(x, na.rm = T)
  x[x < (qntile[1] - H)] <- caps[1]
  x[x > (qntile[2] + H)] <- caps[2]
  return(x)
}

Here is a function to remove the outliers based on the calculations. For this one, I don't know which recipe I can use to perform same calculation. Can anyone tell me if this is possible to replicate by Dataiku recipe?

Thank you very much for reading. Hopefully I can get some answers for these questions.

Best,

Tim

Transition some coding steps to Dataiku Recipe

Categories

Setup Info

Tags