Failed to train : : charmap

Pranay Registered Posts: 7

Hello, I am using the free trail version of Dataiku DSS 12.1.2 on localhost to use it for a recommendation system. While training the model, I am getting the following error - "Failed to train : <class 'UnicodeEncodeError'> : charmap" with the logs snippet shown below. Can someone please help me solve this issue? I have tried to update the packages, and rebuild the env but that didn't work too.


[2023/08/04-13:12:38.670] [MRT-1706] [INFO] [dku.kernels] - Process was cleaned up by monitoring thread[2023/08/04-13:12:38.673] [MRT-1706] [INFO] [dku.kernels] - Trying to enrich exception: Failed to train : <class 'UnicodeEncodeError'> : charmap from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@2855092d retcode=0[2023/08/04-13:12:38.676] [MRT-1706] [WARN] [] - Training Failed to train : <class 'UnicodeEncodeError'> : charmapat$AsyncResult.checkException($AsyncResult.get($[2023/08/04-13:12:38.679] [MRT-1706] [INFO] [] - Closed socket[2023/08/04-13:12:38.681] [MRT-1706] [INFO] [] - Closed socket[2023/08/04-13:12:38.684] [MRT-1706] [INFO] [] - Closed serverSocket[2023/08/04-13:12:38.686] [MRT-1706] [ERROR] [] - Processing Failed to train : <class 'UnicodeEncodeError'> : charmapat$AsyncResult.checkException($AsyncResult.get($[2023/08/04-13:12:38.690] [MRT-1706] [INFO] [] - Locking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json[2023/08/04-13:12:38.699] [MRT-1706] [INFO] [] - Unlocking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json[2023/08/04-13:12:38.702] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [] T-mgjMlxna - [ct: 84515] Processing thread joined ...[2023/08/04-13:12:38.706] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis] T-mgjMlxna - [ct: 84519] Train done

Operating system used: Windows (10 Enterprise)


  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 293 Dataiker

    Hi @Pranay

    It appears that there is a character encoding issue. Please check your dataset for non-ascii characters. One way to remove them would be to use a prepare recipe -> "simplify text" or "transform string" processor.

    Let me know if you have any questions.



  • Pranay
    Pranay Registered Posts: 7

    Thanks for the solution. For the short term, I manually removed the non-ascii characters from my small dataset, but the long term solution of using "transform strings" works perfectly!

Setup Info
      Help me…