Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I get an error when custom preprocessing a text features in a VisualML model:
Traceback (most recent call last): File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/server.py", line 47, in serve [2020/07/14-14:42:34.999] [MRT-16917] [INFO] [dku.block.link.interaction] - Check result for nullity exceptionIfNull=true result=null ret = api_command(arg) File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/dkuapi.py", line 45, in aux return api(**kwargs) File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/commands.py", line 311, in train_prediction_models_nosave preproc_handler.save_data() File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/preprocessing_handler.py", line 166, in save_data self._save_resource(resource_name) File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/preprocessing_handler.py", line 106, in _save_resource pickle.dump(resource, resource_file, 2) _pickle.PicklingError: Can't pickle <class 'StringIdentity'>: attribute lookup StringIdentity on builtins failed
The processing snippet looks like this:
import numpy as np
import pandas as pd
class StringIdentity:
def __init__(self, names=["DefaultName"]):
self.names = names
def fit(self, series):
pass
def transform(self, series):
a = pd.DataFrame(series.map(lambda x : np.array([x])), columns=self.names)
return a
processor = StringIdentity(["path"])
The purpose is to pass the unchanged string as an input feature.
What could be the issue here? How can I change to code that it works with pickle?
Thank you!
Hi,
The custom preprocessing can also be put in the per-project libraries editor (https://doc.dataiku.com/dss/latest/python/reusing-code.html#sharing-python-code-within-a-project)
However, from reading your code, am I correct that it would keep string data in the output ? A model can only be trained with purely numerical data, so your preprocessing needs to somehow encode the string to numericals
I understand now that I am supposed to put the class in a file in the "lib/python" directory (from https://doc.dataiku.com/dss/latest/machine-learning/features-handling/custom.html). Is this directory located on the local file system DSS DATA_DIR? How can a normal user (i.e. not an admin) add a custom preprocessor?
Hi,
The custom preprocessing can also be put in the per-project libraries editor (https://doc.dataiku.com/dss/latest/python/reusing-code.html#sharing-python-code-within-a-project)
However, from reading your code, am I correct that it would keep string data in the output ? A model can only be trained with purely numerical data, so your preprocessing needs to somehow encode the string to numericals
Thank you for your reply. I am developing a plugin for VisualML and it requires the filename as an input. BTW, should it be possible to create a module in the plugin python-lib and use it in the custom preprocessor?
I tried both described methods (first directory with module name with __init__.py inside and second a file named as the module name (stringidentity.py) in the root directoty) but alas none worked. I had it previously working with a global python file (DATA_DIR/lib/python/stringidentity.py) and after removing it and trying the per-project library it stopped working. Do I have to restart some services in order to register the per-project lib?
I double checked to be sure and ran
import sys
for path in sys.path:
print("PATH: {}".format(path))
The result is:
PATH: PATH: /ws/dss/lib/python PATH: /ws/dataiku-dss-7.0.2/python PATH: /ws/dataiku-dss-7.0.2/dku-jupyter/packages PATH: /ws/dss/tmp/ml-plugins-lib/159479944002615163898647754708507 PATH: /ws/dss/plugins/dev/visual-project/python-lib PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python36.zip PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6 PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/lib-dynload PATH: /usr/lib/python3.6 PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/site-packages PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/site-packages/IPython/extensions
So it seems that the plugin-lib should work but the project-lib does not?!