Failed to train : : charmap
Hello, I am using the free trail version of Dataiku DSS 12.1.2 on localhost to use it for a recommendation system. While training the model, I am getting the following error - "Failed to train : <class 'UnicodeEncodeError'> : charmap" with the logs snippet shown below. Can someone please help me solve this issue? I have tried to update the packages, and rebuild the env but that didn't work too.
Logs:
[2023/08/04-13:12:38.670] [MRT-1706] [INFO] [dku.kernels] - Process was cleaned up by monitoring thread [2023/08/04-13:12:38.673] [MRT-1706] [INFO] [dku.kernels] - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@2855092d retcode=0 [2023/08/04-13:12:38.676] [MRT-1706] [WARN] [dku.analysis.ml.python] - Training failed com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190) at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:211) at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:76) at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:170) [2023/08/04-13:12:38.679] [MRT-1706] [INFO] [dku.block.link] - Closed socket [2023/08/04-13:12:38.681] [MRT-1706] [INFO] [dku.block.link] - Closed socket [2023/08/04-13:12:38.684] [MRT-1706] [INFO] [dku.block.link] - Closed serverSocket [2023/08/04-13:12:38.686] [MRT-1706] [ERROR] [dku.analysis.ml.python] - Processing failed com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215) at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190) at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:211) at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:76) at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:170) [2023/08/04-13:12:38.690] [MRT-1706] [INFO] [dku.analysis.ml] - Locking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json [2023/08/04-13:12:38.699] [MRT-1706] [INFO] [dku.analysis.ml] - Unlocking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json [2023/08/04-13:12:38.702] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis.ml.python] T-mgjMlxna - [ct: 84515] Processing thread joined ... [2023/08/04-13:12:38.706] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis] T-mgjMlxna - [ct: 84519] Train done
Operating system used: Windows (10 Enterprise)
Answers
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @Pranay
,It appears that there is a character encoding issue. Please check your dataset for non-ascii characters. One way to remove them would be to use a prepare recipe -> "simplify text" or "transform string" processor.
Let me know if you have any questions.
Thanks!
Jordan
-
Thanks for the solution. For the short term, I manually removed the non-ascii characters from my small dataset, but the long term solution of using "transform strings" works perfectly!