Issues in Running Similarity Search Plug-in
I am getting the following error when I try to run the Similarity Search in DATA IKU
We have tried multiple methods - Glove , ELMO , FastText , Word2Ve but still end up in the same error . The Sentence Embeding works fine , But When we try to use Similarity Search it throws the below error
Any pointers will help
----------------------------------------------- Error Logs --------------------------------------------------------
search_managed/bin/python","cpuUserTimeMS":0,"cpuSystemTimeMS":0,"cpuChildrenUserTimeMS":0,"cpuChildrenSystemTimeMS":0,"cpuTotalMS":0,"cpuCurrent":0.0,"vmSizeMB":120,"vmRSSMB":4,"vmHWMMB":4,"vmRSSAnonMB":1,"vmDataMB":1,"vmSizePeakMB":121,"vmRSSPeakMB":4,"vmRSSTotalMBS":0,"majorFaults":2,"childrenMajorFaults":0}}
[2021/05/20-09:07:13.930] [FRT-33-FlowRunnable] [INFO] [dku.recipes.code.base] act.compute_R0KAzeJy_NP - Error file found, trying to throw it: /home/dataiku/dss/jobs/PLUGIN_NLP/Build_word2vec_NNSI_2021-05-20T09-07-05.683/compute_R0KAzeJy_NP/custom-python-recipe/pyout2BA6E19kip4w/error.json
[2021/05/20-09:07:13.930] [FRT-33-FlowRunnable] [INFO] [dku.recipes.code.base] act.compute_R0KAzeJy_NP - Raw error is{"errorType":"\u003cclass \u0027TypeError\u0027\u003e","message":"a bytes-like object is required, not \u0027str\u0027","detailedMessage":"At line 35: \u003cclass \u0027TypeError\u0027\u003e: a bytes-like object is required, not \u0027str\u0027","stackTrace":[]}
[2021/05/20-09:07:13.931] [FRT-33-FlowRunnable] [INFO] [dku.recipes.code.base] act.compute_R0KAzeJy_NP - Now err: {"errorType":"\u003cclass \u0027TypeError\u0027\u003e","message":"Error in Python process: a bytes-like object is required, not \u0027str\u0027","detailedMessage":"Error in Python process: At line 35: \u003cclass \u0027TypeError\u0027\u003e: a bytes-like object is required, not \u0027str\u0027","stackTrace":[]}
[2021/05/20-09:07:13.934] [FRT-33-FlowRunnable] [INFO] [dku.flow.activity] act.compute_R0KAzeJy_NP - Run thread failed for activity compute_R0KAzeJy_NP
com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: At line 35: <class 'TypeError'>: a bytes-like object is required, not 'str'
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleErrorFile(AbstractCodeBasedActivityRunner.java:221)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:186)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:48)
at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:71)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2021/05/20-09:07:13.990] [ActivityExecutor-28] [INFO] [dku.flow.activity] running compute_R0KAzeJy_NP - activity is finished
[2021/05/20-09:07:13.992] [ActivityExecutor-28] [ERROR] [dku.flow.activity] running compute_R0KAzeJy_NP - Activity failed
com.dataiku.common.server.APIError$SerializedErrorException: Error in Python process: At line 35: <class 'TypeError'>: a bytes-like object is required, not 'str'
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleErrorFile(AbstractCodeBasedActivityRunner.java:221)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.handleExecutionResult(AbstractCodeBasedActivityRunner.java:186)
at com.dataiku.dip.dataflow.exec.AbstractCodeBasedActivityRunner.execute(AbstractCodeBasedActivityRunner.java:103)
at com.dataiku.dip.dataflow.exec.AbstractPythonRecipeRunner.executeScript(AbstractPythonRecipeRunner.java:48)
at com.dataiku.dip.recipes.customcode.CustomPythonRecipeRunner.run(CustomPythonRecipeRunner.java:71)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner$FlowRunnableThread.run(ActivityRunner.java:374)
[2021/05/20-09:07:13.992] [ActivityExecutor-28] [INFO] [dku.flow.activity] running compute_R0KAzeJy_NP - Executing default post-activity lifecycle hook
[2021/05/20-09:07:13.994] [ActivityExecutor-28] [INFO] [dku.flow.activity] running compute_R0KAzeJy_NP - Done post-activity tasks
Answers
-
Hi,
Which DSS version are you using? Please note that to use this plugin, you need to be on DSS 8.0.2 or higher.
If you are on DSS 8.0.2 and above, then could you please open a new support ticket , alongside with an instance diagnosis.
Thanks,
Kim
-
rennyjosetm Partner, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS Core Concepts, Registered Posts: 6 Partner
Unfortunately we are with D88 8.00 Version . Is there any workaround that we can use or are there any other similar plug-ins that that we can potentially look at
Appreciate the quick response
-
Hi @rennyjosetm
,Unfortunately there are no workarounds for this but to upgrade. There are no similar plugins as an alternative either.
Thanks,
Kim
-
The reason for this error is that in Python 3, strings are Unicode, but when transmitting on the network, the data needs to be bytes instead. We can convert bytes to string using bytes class decode() instance method, So you need to decode the bytes object to produce a string. In Python 3 , the default encoding is "utf-8" , so you can use directly:
b"python byte to string".decode("utf-8")
Python makes a clear distinction between bytes and strings . Bytes objects contain raw data — a sequence of octets — whereas strings are Unicode sequences . Conversion between these two types is explicit: you encode a string to get bytes, specifying an encoding (which defaults to UTF-8); and you decode bytes to get a string. Clients of these functions should be aware that such conversions may fail, and should consider how failures are handled.