PicklingError for Custom Preprocessing (Text)

rmios Registered Posts: 19 ✭✭✭✭
edited July 16 in General Discussion

I get an error when custom preprocessing a text features in a VisualML model:

Traceback (most recent call last):
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/server.py", line 47, in serve
[2020/07/14-14:42:34.999] [MRT-16917] [INFO] [dku.block.link.interaction]  - Check result for nullity exceptionIfNull=true result=null
    ret = api_command(arg)
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/dkuapi.py", line 45, in aux
    return api(**kwargs)
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/commands.py", line 311, in train_prediction_models_nosave
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/preprocessing_handler.py", line 166, in save_data
  File "/ws/dataiku-dss-7.0.2/python/dataiku/doctor/preprocessing_handler.py", line 106, in _save_resource
    pickle.dump(resource, resource_file, 2)
_pickle.PicklingError: Can't pickle <class 'StringIdentity'>: attribute lookup StringIdentity on builtins failed

The processing snippet looks like this:

import numpy as np
import pandas as pd

class StringIdentity:
    def __init__(self, names=["DefaultName"]):
        self.names = names

    def fit(self, series):

    def transform(self, series):
        a = pd.DataFrame(series.map(lambda x : np.array([x])), columns=self.names)
        return a

processor = StringIdentity(["path"])

The purpose is to pass the unchanged string as an input feature.

What could be the issue here? How can I change to code that it works with pickle?

Thank you!

Best Answer


  • rmios
    rmios Registered Posts: 19 ✭✭✭✭

    I understand now that I am supposed to put the class in a file in the "lib/python" directory (from https://doc.dataiku.com/dss/latest/machine-learning/features-handling/custom.html). Is this directory located on the local file system DSS DATA_DIR? How can a normal user (i.e. not an admin) add a custom preprocessor?

  • rmios
    rmios Registered Posts: 19 ✭✭✭✭

    Thank you for your reply. I am developing a plugin for VisualML and it requires the filename as an input. BTW, should it be possible to create a module in the plugin python-lib and use it in the custom preprocessor?

    I tried both described methods (first directory with module name with __init__.py inside and second a file named as the module name (stringidentity.py) in the root directoty) but alas none worked. I had it previously working with a global python file (DATA_DIR/lib/python/stringidentity.py) and after removing it and trying the per-project library it stopped working. Do I have to restart some services in order to register the per-project lib?

  • rmios
    rmios Registered Posts: 19 ✭✭✭✭
    edited July 17

    I double checked to be sure and ran

    import sys
    for path in sys.path:
    print("PATH: {}".format(path))

    The result is:

    PATH: /ws/dss/lib/python
    PATH: /ws/dataiku-dss-7.0.2/python
    PATH: /ws/dataiku-dss-7.0.2/dku-jupyter/packages
    PATH: /ws/dss/tmp/ml-plugins-lib/159479944002615163898647754708507
    PATH: /ws/dss/plugins/dev/visual-project/python-lib
    PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python36.zip
    PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6
    PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/lib-dynload
    PATH: /usr/lib/python3.6
    PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/site-packages
    PATH: /ws/dss/code-envs/python/TensorFlow2/lib/python3.6/site-packages/IPython/extensions

    So it seems that the plugin-lib should work but the project-lib does not?!

Setup Info
      Help me…