Advanced Designer Learning Path is now live! Read More

PicklingError for Custom Preprocessing (Text)

Level 3
PicklingError for Custom Preprocessing (Text)

I get an error when custom preprocessing a text features in a VisualML model:

Traceback (most recent call last):
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/", line 47, in serve
[2020/07/14-14:42:34.999] [MRT-16917] [INFO] []  - Check result for nullity exceptionIfNull=true result=null
    ret = api_command(arg)
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/", line 45, in aux
    return api(**kwargs)
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/", line 311, in train_prediction_models_nosave
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/", line 166, in save_data
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/", line 106, in _save_resource
    pickle.dump(resource, resource_file, 2)
_pickle.PicklingError: Can't pickle <class 'StringIdentity'>: attribute lookup StringIdentity on builtins failed

The processing snippet looks like this:

import numpy as np
import pandas as pd

class StringIdentity:
    def __init__(self, names=["DefaultName"]):
        self.names = names

    def fit(self, series):

    def transform(self, series):
        a = pd.DataFrame( x : np.array([x])), columns=self.names)
        return a

processor = StringIdentity(["path"])

The purpose is to pass the unchanged string as an input feature.

What could be the issue here? How can I change to code that it works with pickle?

Thank you!

0 Kudos
4 Replies
Level 3

I understand now that I am supposed to put the class in a file in the "lib/python" directory (from Is this directory located on the local file system DSS DATA_DIR? How can a normal user (i.e. not an admin) add a custom preprocessor?

0 Kudos


The custom preprocessing can also be put in the per-project libraries editor (

However, from reading your code, am I correct that it would keep string data in the output ? A model can only be trained with purely numerical data, so your preprocessing needs to somehow encode the string to numericals

0 Kudos
Level 3

Thank you for your reply. I am developing a plugin for VisualML and it requires the filename as an input. BTW, should it be possible to create a module in the plugin python-lib and use it in the custom preprocessor?

I tried both described methods (first directory with module name with inside and second a file named as the module name ( in the root directoty) but alas none worked. I had it previously working with a global python file (DATA_DIR/lib/python/ and after removing it and trying the per-project library it stopped working. Do I have to restart some services in order to register the per-project lib?

0 Kudos
Level 3

I double checked to be sure and ran

import sys
for path in sys.path:
print("PATH: {}".format(path))

The result is:

PATH: /ws/dss/lib/python
PATH: /ws/dataiku-dss-7.0.2/python
PATH: /ws/dataiku-dss-7.0.2/dku-jupyter/packages
PATH: /ws/dss/tmp/ml-plugins-lib/159479944002615163898647754708507
PATH: /ws/dss/plugins/dev/visual-project/python-lib
PATH: /ws/dss/code-envs/python/TensorFlow2/lib/
PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6
PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/lib-dynload
PATH: /usr/lib/python3.6
PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/site-packages
PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/site-packages/IPython/extensions

 So it seems that the plugin-lib should work but the project-lib does not?!

0 Kudos
A banner prompting to get Dataiku DSS