Ml Model Training Failed

omarh2m

Hello,

I attempted to execute a partitioned XGBoost model using a Python 3.8 coding environment. The log files containing the details are enclosed in the attached .txt file. and also the train diagnosis in a zip file.

Could you assist me in comprehending the underlying cause and suggest potential solutions to address this issue?

the model however train perfectly using the built in code env.

this is really urgent for me.

Thank you! Omar.

Operating system used: centos (7)

SarinaS

Hi @omarh2m,

In case anyone encounters this in the future, I'm copying my response here as well:

From the logs, I see that the training is failing for most partitions because the code environment docker image hasn't been built:

[2024/04/05-15:52:43.128] [MRT-110744] [WARN] [dku.analysis.ml.python]  - Training failed
com.dataiku.dip.exceptions.CodedIOException: No recorded image tag for env PYTHON PYTHON38 version= null. Maybe you need to build Docker image
    at com.dataiku.dip.containers.exec.ContainerExecImagesHelper.getImageVersionToUse(ContainerExecImagesHelper.java:77)
    at com.dataiku.dip.containers.exec.ContainerExecImagesHelper.getImageTagToUse(ContainerExecImagesHelper.java:89)
    at com.dataiku.dip.analysis.coreservices.AnalysisMLContainerKernel.start(AnalysisMLContainerKernel.java:171)
    at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:164)

This means that you need to navigate to your code environment under Administration > Code envs > PYTHON38 > Containerized execution and make sure that the container configuration "ebstgdssk8s" is selected under "Selected container configurations" (or that All container configurations is selected):

This will ensure that the code environment builds a docker image. Once you've selected the configuration "bstgdssk8s", you will need to select "update" to build the code environment image. Once the build is successful, you can retry your training.

Thanks,
Sarina

Sign up to take part

Ml Model Training Failed

Ml Model Training Failed

Setup info