Ml Model Training Failed

omarh2m Registered Posts: 6


I attempted to execute a partitioned XGBoost model using a Python 3.8 coding environment. The log files containing the details are enclosed in the attached .txt file. and also the train diagnosis in a zip file.

Could you assist me in comprehending the underlying cause and suggest potential solutions to address this issue?

the model however train perfectly using the built in code env.

this is really urgent for me.

Thank you! Omar.

Operating system used: centos (7)


  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker

    Hi @omarh2m

    In case anyone encounters this in the future, I'm copying my response here as well:

    From the logs, I see that the training is failing for most partitions because the code environment docker image hasn't been built:
    [2024/04/05-15:52:43.128] [MRT-110744] [WARN] []  - Training failed
    com.dataiku.dip.exceptions.CodedIOException: No recorded image tag for env PYTHON PYTHON38 version= null. Maybe you need to build Docker image
    at com.dataiku.dip.containers.exec.ContainerExecImagesHelper.getImageVersionToUse(
    at com.dataiku.dip.containers.exec.ContainerExecImagesHelper.getImageTagToUse(
    at com.dataiku.dip.analysis.coreservices.AnalysisMLContainerKernel.start(
    This means that you need to navigate to your code environment under Administration > Code envs > PYTHON38 > Containerized execution and make sure that the container configuration "ebstgdssk8s" is selected under "Selected container configurations" (or that All container configurations is selected):

    This will ensure that the code environment builds a docker image. Once you've selected the configuration "bstgdssk8s", you will need to select "update" to build the code environment image. Once the build is successful, you can retry your training.


Setup Info
      Help me…