Failed to train : : charmap

Registered Posts: 7
edited July 2024 in Using Dataiku

Hello, I am using the free trail version of Dataiku DSS 12.1.2 on localhost to use it for a recommendation system. While training the model, I am getting the following error - "Failed to train : <class 'UnicodeEncodeError'> : charmap" with the logs snippet shown below. Can someone please help me solve this issue? I have tried to update the packages, and rebuild the env but that didn't work too.

Logs:

[2023/08/04-13:12:38.670] [MRT-1706] [INFO] [dku.kernels]  - Process was cleaned up by monitoring thread
[2023/08/04-13:12:38.673] [MRT-1706] [INFO] [dku.kernels]  - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@2855092d retcode=0
[2023/08/04-13:12:38.676] [MRT-1706] [WARN] [dku.analysis.ml.python]  - Training failed
com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap
    at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302)
    at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215)
    at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190)
    at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:211)
    at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:76)
    at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:170)
[2023/08/04-13:12:38.679] [MRT-1706] [INFO] [dku.block.link]  - Closed socket
[2023/08/04-13:12:38.681] [MRT-1706] [INFO] [dku.block.link]  - Closed socket
[2023/08/04-13:12:38.684] [MRT-1706] [INFO] [dku.block.link]  - Closed serverSocket
[2023/08/04-13:12:38.686] [MRT-1706] [ERROR] [dku.analysis.ml.python]  - Processing failed
com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap
    at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302)
    at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215)
    at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190)
    at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:211)
    at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:76)
    at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:170)
[2023/08/04-13:12:38.690] [MRT-1706] [INFO] [dku.analysis.ml]  - Locking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json
[2023/08/04-13:12:38.699] [MRT-1706] [INFO] [dku.analysis.ml]  - Unlocking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json
[2023/08/04-13:12:38.702] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis.ml.python] T-mgjMlxna - [ct: 84515] Processing thread joined ...
[2023/08/04-13:12:38.706] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis] T-mgjMlxna - [ct: 84519] Train done

Operating system used: Windows (10 Enterprise)

Answers

  • Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker

    Hi @Pranay
    ,

    It appears that there is a character encoding issue. Please check your dataset for non-ascii characters. One way to remove them would be to use a prepare recipe -> "simplify text" or "transform string" processor.

    Let me know if you have any questions.

    Thanks!

    Jordan

  • Registered Posts: 7

    Thanks for the solution. For the short term, I manually removed the non-ascii characters from my small dataset, but the long term solution of using "transform strings" works perfectly!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.