The Dataiku Frontrunner Awards are now accepting submissions until July 15 to recognize your achievements! ENTER YOUR SUBMISSION

'Impute missing values' does not work

Level 1
Level 1
'Impute missing values' does not work

I try to do a simple median or mode imputation on a column with integer values with a 'Impute missing values' step within a visual prepare recipe. The script output shows the column with correctly imputed values. However, after I run the script, this transformation step seem to have no effect although the other steps in this prepare recipe were computed correctly. The recipe is computed via the Spark engine, as this transformation step does not work with the normal DSS engine. The job does not show any errors. In the logs, I can also see, that this transformation step is at least part of the job definition:

[2021/05/26-17:09:56.148] [null-err-55] [INFO] [dku.utils]  -     {
[2021/05/26-17:09:56.148] [null-err-55] [INFO] [dku.utils]  -       "type": "FillEmptyWithComputedValue",
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -       "params": {
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -         "mode": "MEDIAN",
[2021/05/26-17:09:56.149] [null-err-55] [INFO] [dku.utils]  -         "appliesTo": "COLUMNS",
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -         "columns": [
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -           "XXX",
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -           "YYY"
[2021/05/26-17:09:56.150] [null-err-55] [INFO] [dku.utils]  -         ]
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       },
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       "metaType": "PROCESSOR",
[2021/05/26-17:09:56.151] [null-err-55] [INFO] [dku.utils]  -       "preview": false,
[2021/05/26-17:09:56.152] [null-err-55] [INFO] [dku.utils]  -       "disabled": false,
[2021/05/26-17:09:56.152] [null-err-55] [INFO] [dku.utils]  -       "comment": "Impute XXX and YYY",
[2021/05/26-17:09:56.153] [null-err-55] [INFO] [dku.utils]  -       "alwaysShowComment": true
[2021/05/26-17:09:56.153] [null-err-55] [INFO] [dku.utils]  -     }

I'm working with DSS 8.0.2.

Any idea, why this is not working?

0 Kudos
1 Reply

Thanks for the detailed analysis. We could reproduce your issue and we are investigating it.

In the mean time you can achieve a similar behavior by:
1. Right clicking on the column that you want to impute
2. Choosing "Fill empty rows with..."
3. Choosing the option you want Mode/Mean/constant value

However the mode/mean value will be calculated on the current sample and will be final (not computed on possibly new input data when running the recipe). So the behavior is slightly different from the "Impute missing values"  processor you are using.

Hope it helps,

0 Kudos
A banner prompting to get Dataiku DSS