Correcting Typos - Text Preparation Plugin?
Hi everyone,
Context: I have data from a survey. One question in the survey is multiple choice and answers predefined, but there is no data validation built into the survey. As a result, I have typos in the data. For example, a column "Genre" can include "Rcok, Clasisc, Jaz".
Question: Is there are smart/quick way to correct spelling mistakes in Dataiku?
Attempted solution: I tried using the Text Preparation Plugin but I get the following error: Error in Python process: At line 4: <class 'ModuleNotFoundError'>: No module named 'regex'. Could this be related to the Code environment (I'm using python 3.11 but the text preparation plugin seems to work with version 3.6 or 3.7.)?
Thanks!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,248 Dataiker
Hi,
The plugin will work with the python mentioned here: https://github.com/dataiku/dss-plugin-nlp-preparation/blob/main/code-env/python/desc.json
Curently : "PYTHON36", "PYTHON37", "PYTHON38", "PYTHON39"
Could you try with one these python and see if it works for you?
Thanks