Repeated Random Splitting and Bootstrapping with XGBoost

Stuart_Song
Stuart_Song Registered Posts: 1

I have a dataset that I want to random split into train and test sets with an 80/20 ratio. I aim to repeat this random splitting and bootstrapping of the training data 1,000 times. For each iteration, I'll train an XGBoost model and then export the SHAP values, Gini index for each feature, F1 score, and ROC AUC for the model.

I'm aware that Dataiku handles feature processing (etc.scaling) and hyperparameter tuning, but the Jupyter notebook I exported only includes the tuned hyperparameters. I tried hyperparameter tuning using other packages but I can never get the results as good as dataiku.

Could anyone guide me on how to perform this process efficiently in Python? Use Dataiku API or anything I can do using the Dataiku CSS?

Setup Info
    Tags
      Help me…