Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello, I am using the free trail version of Dataiku DSS 12.1.2 on localhost to use it for a recommendation system. While training the model, I am getting the following error - "Failed to train : <class 'UnicodeEncodeError'> : charmap" with the logs snippet shown below. Can someone please help me solve this issue? I have tried to update the packages, and rebuild the env but that didn't work too.
Logs:
[2023/08/04-13:12:38.670] [MRT-1706] [INFO] [dku.kernels] - Process was cleaned up by monitoring thread
[2023/08/04-13:12:38.673] [MRT-1706] [INFO] [dku.kernels] - Trying to enrich exception: com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap from kernel com.dataiku.dip.analysis.coreservices.AnalysisMLKernel@2855092d retcode=0
[2023/08/04-13:12:38.676] [MRT-1706] [WARN] [dku.analysis.ml.python] - Training failed
com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap
at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190)
at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:211)
at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:76)
at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:170)
[2023/08/04-13:12:38.679] [MRT-1706] [INFO] [dku.block.link] - Closed socket
[2023/08/04-13:12:38.681] [MRT-1706] [INFO] [dku.block.link] - Closed socket
[2023/08/04-13:12:38.684] [MRT-1706] [INFO] [dku.block.link] - Closed serverSocket
[2023/08/04-13:12:38.686] [MRT-1706] [ERROR] [dku.analysis.ml.python] - Processing failed
com.dataiku.dip.io.SocketBlockLinkKernelException: Failed to train : <class 'UnicodeEncodeError'> : charmap
at com.dataiku.dip.io.SocketBlockLinkInteraction.throwExceptionFromPython(SocketBlockLinkInteraction.java:302)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.checkException(SocketBlockLinkInteraction.java:215)
at com.dataiku.dip.io.SocketBlockLinkInteraction$AsyncResult.get(SocketBlockLinkInteraction.java:190)
at com.dataiku.dip.io.SingleCommandKernelLink$1.call(SingleCommandKernelLink.java:211)
at com.dataiku.dip.analysis.ml.prediction.PredictionTrainAdditionalThread.process(PredictionTrainAdditionalThread.java:76)
at com.dataiku.dip.analysis.ml.shared.PRNSTrainThread.run(PRNSTrainThread.java:170)
[2023/08/04-13:12:38.690] [MRT-1706] [INFO] [dku.analysis.ml] - Locking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json
[2023/08/04-13:12:38.699] [MRT-1706] [INFO] [dku.analysis.ml] - Unlocking model train info file C:\Users\t0278540\AppData\Local\Dataiku\DataScienceStudio\dss_home\analysis-data\RECOMMENDATION_ATTEMPT1\mi8OCo9P\mgjMlxna\sessions\s2\pp1\m1\train_info.json
[2023/08/04-13:12:38.702] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis.ml.python] T-mgjMlxna - [ct: 84515] Processing thread joined ...
[2023/08/04-13:12:38.706] [FT-TrainWorkThread-BWE8Tw2G-1704] [INFO] [dku.analysis] T-mgjMlxna - [ct: 84519] Train done
Operating system used: Windows (10 Enterprise)
Hi @Pranay,
It appears that there is a character encoding issue. Please check your dataset for non-ascii characters. One way to remove them would be to use a prepare recipe -> "simplify text" or "transform string" processor.
Let me know if you have any questions.
Thanks!
Jordan
Thanks for the solution. For the short term, I manually removed the non-ascii characters from my small dataset, but the long term solution of using "transform strings" works perfectly!