Practical use of the Text Preparation Plugin

tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron

I'm working with short-form text question answers that have been hand-typed into online forms. As is typical with human hand-entered data there are typographical errors all over the place. A small percentage of folks who fill out these forms are providing non-English responses.

I'm trying to use the Dataiku Text Preparation plugin. As I'm using it, I'm finding a large number of errors. Language identification is of poor quality. Therefore multilingual spelling correction is poor. And I've not even tried language translation.

I'm on a non-profit budget right now. I'm wondering if there are folks out there who are dealing with this kind of challenge without resorting to sending the data to service providers like google, amazon, or azure. And if you have no other way to get this kind of thing done. Who have folks found most economical.

Operating system used: Mac OS Ventura 13.0.1

Setup Info
      Help me…